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AUTHORS’ PREFACE 


The present course is based on lectures given by I. M. 
Gelfand in the Mechanics and Mathematics Department 
of Moscow State University. However, the book goes 
considerably beyond the material actually presented in 
the lectures. Our aim is to give a treatment of the ele¬ 
ments of the calculus of variations in a form which is 
both easily understandable and sufficiently modern. 
Considerable attention is devoted to physical applica¬ 
tions of variational methods, e.g., canonical equations, 
variational principles of mechanics and conservation laws. 

The reader who merely wishes to become familiar with 
the most basic concepts and methods of the calculus of 
variations need only study the first chapter. The first 
three chapters, taken together, form a more compre¬ 
hensive course on the elements of the calculus of varia- 
tions r but one which is still quite elementary (involving 
only necessary conditions for extrema). The first six 
chapters contain, more or less, the material given in 
the usual university course in the calculus of variations 
(with applications to the mechanics of systems with a 
finite number of degrees of freedom), including the 
theory of fields (presented in a somewhat novel way) 
and sufficient conditions for weak and strong extrema. 
Chapter 7 is devoted to the application of variational 
methods to the study of systems with infinitely many 
degrees of freedom. Chapter 8 contains a brief treat¬ 
ment of direct methods in the calculus of variations. 

The authors are grateful to M. A. Yevgrafov and A. G. 
Kostyuchenko, who read the book in manuscript and 
made many useful comments. 

I. M. G. 

s. v. F. 




TRANSLATOR’S PREFACE 


This book is a modern introduction to the calculus of 
variations and certain of its ramifications, and I trust 
that its fresh and lively point of view will serve to make 
it a welcome addition to the English-language literature 
on the subject. The present edition is rather different 
from the Russian original. With the authors’ consent, 
I have given free rein to the tendency of any mathe¬ 
matically educated translator to assume the functions 
of annotator and stylist. In so doing, I have had two 
special assets: 1) A substantial list of revisions and 
corrections from Professor S. V. Fomin himself, and 
2) A variety of helpful suggestions from Professor J. T. 
Schwartz of New York University, who read the entire 
translation in typescript. 

The problems appearing at the end of each of the eight 
chapters and two appendices were made specifically for 
the English edition, and many of them comment further 
on the corresponding parts of the text. A variety of 
Russian sources have played an important role in the 
synthesis of this material. In particular, I have consulted 
the textbooks on the calculus of variations by N. I. 
Akhiezer, by L. E. Elsgolts, and by M. A. Lavrentev 
and L. A. Lyusternik, as well as Volume 2 of the well- 
known problem collection by N. M. Gyunter and R. O. 
Kuzmin, and Chapter 3 of G. E. Shilov’s “Mathematical 
Analysis, A Special Course.” 

At the end of the book I have added a Bibliography 
containing suggestions for collateral and supplementary 
reading. This list is not intended as an exhaustive cata¬ 
log of the literature, and is in fact confined to books 
available in English. 

r. a. s. 
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ELEMENTS 
OF THE THEORY 


I. Functionals. Some Simple Variational Problems 

Variable quantities called functionals play an important role in many 
problems arising in analysis, mechanics, geometry, etc. By a functional , we 
mean a correspondence which assigns a definite (real) number to each function 
(or curve) belonging to some class. Thus, one might say that a functional is 
a kind of function, where the independent variable is itself a function (or 
curve). The following are examples of functionals: 

1. Consider the set of all rectifiable plane curves. 1 A definite number is 
associated with each such curve, namely, its length. Thus, the length 
of a curve is a functional defined on the set of rectifiable curves. 

2. Suppose that each rectifiable plane curve is regarded as being made 
out of some homogeneous material. Then if we associate with each 
such curve the ordinate of its center of mass, we again obtain a 
functional. 

3. Consider all possible paths joining two given points A and B in the 
plane. Suppose that a particle can move along any of these paths, 
and let the particle have a definite velocity v(x, y) at the point (x, y). 
Then we obtain a functional by associating with each path the time the 
particle takes to traverse the path. 


1 In analysis, the length of a curve is defined as the limiting length of a polygonal line 
inscribed in the curve (i.e., with vertices lying on the curve) as the maximum length of 
the chords forming the polygonal line goes to zero. If this limit exists and is finite, the 
curve is said to be rectifiable. 
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4. Let y(x ) be an arbitrary continuously differentiable function, defined 
on the interval [a, b]. 2 Then the formula 

\ b y'\x)dx 

Ja 

defines a functional on the set of all such functions y(x). 

5. As a more general example, let F(x, y, z) be a continuous function of 
three variables. Then the expression 

J[y\= C F[x,y(x),y'(x)]dx, (1) 

Ja 

where y(x) ranges over the set of all continuously differentiable functions 
defined on the interval [<a , b], defines a functional. By choosing 
different functions F{x , y, z), we obtain different functionals. For 
example, if 

F(x, y,z) = V 1 + z 2 , 

J[y ] is the length of the curve y = y(x ), as in the first example, while if 

F{x, y, z) = z 2 , 

J[y ] reduces to the case considered in the fourth example. In what 
follows, we shall be concerned mainly with functionals of the form (1). 

Particular instances of problems involving the concept of a functional 
were considered more than three hundred years ago, and in fact, the first 
important results in this area are due to Euler (1707-1783). Nevertheless, 
up to now, the “calculus of functionals” still does not have methods of a 
generality comparable to the methods of classical analysis (i.e., the ordinary 
“calculus of functions”). The most developed branch of the “calculus of 
functionals” is concerned with finding the maxima and minima of functionals, 
and is called the “calculus of variations.” Actually, it would be more 
appropriate to call this subject the “calculus of variations in the narrow 
sense,” since the significance of the concept of the variation of a functional 
is by no means confined to its applications to the problem of determining the 
extrema of functionals. 

We now indicate some typical examples of variational problems , by which 
we mean problems involving the determination of maxima and minima of 
functionals. 

1. Find the shortest plane curve joining two points A and B , i.e., find the 
curve y = y(x) for which the functional 

f Vl + y' 2 dx 

Ja 

achieves its minimum. The curve in question turns out to be the straight 
line segment joining A and B. 


2 By [< a , b] is meant the closed interval a ^ x ^ b. 
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2. Let A and B be two fixed points. Then the time it takes a particle to 
slide under the influence of gravity along some path joining A and B 
depends on the choice of the path (curve), and hence is a functional. 
The curve such that the particle takes the least time to go from AtoB 
is called the brachistochrone . The brachistochrone problem was posed 
by John Bernoulli in 1696, and played an important part in the develop¬ 
ment of the calculus of variations. The problem was solved by John 
Bernoulli, James Bernoulli, Newton, and L’Hospital. The brachisto¬ 
chrone turns out to be a cycloid, lying in the vertical plane and passing 
through A and B (cf. p. 26). 

3. The following variational problem, called the isoperimetric problem , 
was solved by Euler: Among all closed curves of a given length l, find the 
curve enclosing the greatest area . The required curve turns out to be 
a circle. 

All of the above problems involve functionals which can be written in 
the form 

f F(x, y, y') dx. 

Ja 

Such functionals have a “localization property” consisting of the fact that 
if we divide the curve y = y(x ) into parts and calculate the value of the 
functional for each part, the sum of the values of the functional for the 
separate parts equals the value of the functional for the whole curve. It is 
just these functionals which are usually considered in the calculus of variations. 
As an example of a “nonlocal functional,” consider the expression 

f * xVl + / 2 dx 

Ja 

rb _ ’ 

J a Vl + y ' 2 dx 

which gives the abscissa of the center of mass of a curve y = y(x), a ^ x ^ b, 
made out of some homogeneous material. 

An important factor in the development of the calculus of variations was 
the investigation of a number of mechanical and physical problems, e.g., 
the brachistochrone problem mentioned above. In turn, the methods of the 
calculus of variations are widely applied in various physical problems. It 
should be emphasized that the application of the calculus of variations to 
physics does not consist merely in the solution of individual, albeit very 
important problems. The so-called “variational principles,” to be discussed 
in Chapters 4 and 7, are essentially a manifestation of very general physical 
laws, which are valid in diverse branches of physics, ranging from classical 
mechanics to the theory of elementary particles. 

To understand the basic meaning of the problems and methods of the 
calculus of variations, it is very important to see how they are related to 
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problems of classical analysis, i.e., to the study of functions of n variables. 
Thus, consider a functional of the form 

•/b] = f F(x, y, /) dx, y{a) = A, y(b) = B. 

J a 

Here, each curve is assigned a certain number. To find a related function 
of the sort considered in classical analysis, we may proceed as follows. 
Using the points 

Cl = Xq 9 X-± 9 . . ., X n , -^n + 1 = 

we divide the interval [< a , b] into n + 1 equal parts. Then we replace "the 
curve y = y(x) by the polygonal line with vertices 

(x 0 , A), (x l9 y(x i)), ..., (x n , X*n)X (x n + 1 , B), 

and we approximate the functional J[y ] by the sum 

j(yi, = 2/ (*> y» yi ~h i ~ 1 ) h > (2) 

where 

yt = h = x t - **_!. 

Each polygonal line is uniquely determined by the ordinates yi,... 9 y n of 
its vertices (recall that y 0 = A and y n + 1 = B are fixed), and the sum (2) 
is therefore a function of the n variables y u ..., y n . Thus, as an approxi¬ 
mation, we can regard the variational problem as the problem of finding the 
extrema of the function J(y l9 ..., y n ). In solving variational problems, 
Euler made extensive use of this method of finite differences . By replacing 
smooth curves by polygonal lines, he reduced the problem of finding extrema 
of a functional to the problem of finding extrema of a function of n variables, 
and then he obtained exact solutions by passing to the limit as n-^o o. 
In this sense, functionals can b_e regarded as “functions of infinitely many 
variables” [i.e., the values of the function >>(•*) at separate points], and the 
calculus of variations can be regarded as the corresponding analog of 
differential calculus. 

2. Function Spaces 

In the study of functions of n variables, it is convenient to use geometric 
language, by regarding a set of n numbers (yi,... 9 y n ) as a point in an 
^-dimensional space. In just the same way, geometric language is useful 
when studying functionals. Thus, we shall regard each function y(x) 
belonging to some class as a point in some space, and spaces whose elements 
are functions will be called function spaces. 

In the study of functions of a finite number n of independent variables, 
it is sufficient to consider a single space, i.e., ^-dimensional Euclidean space 
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<? n . 3 However, in the case of function spaces, there is no such “universal” 
space. In fact, the nature of the problem under consideration determines 
the choice of the function space. For example, if we are dealing with a 
functional of the form 

[ F(x,y,y')dx, 

Ja 

it is natural to regard the functional as defined on the set of all functions 
with a continuous first derivative, while in the case of a functional of the 
form 

[ F(x,y,y',y")dx, 

the appropriate function space is the set of all functions with two continuous 
derivatives. Therefore, in studying functionals of various types, it is 
reasonable to use various function spaces. 

The concept of continuity plays an important role for functionals, just 
as it does for the ordinary functions considered in classical analysis. In 
order to formulate this concept for functionals, we must somehow introduce 
a concept of “closeness” for elements in a function space. This is most 
conveniently done by introducing the concept of the norm of a function, 
analogous to the concept of the distance between a point in Euclidean space 
and the origin of coordinates. Although in what follows we shall always 
be concerned with function spaces, it will be most convenient to introduce 
the concept of a norm in a more general and abstract form, by introducing 
the concept of a normed linear space. 

By a linear space , we mean a set 8$ of elements x, y, z, ... of any kind, 
for which the operations of addition and multiplication by (real) numbers 
a, p,... are defined and obey the following axioms: 

1. x + y = y + x\ 

2. (x + y) + z = x + (y + z ); 

3. There exists an element 0 (the zero element) such that x + 0 = x for 
any xe^; 4 

4. For each x e 8$, there exists an element — x such that x + ( — x) = 0; 

5. l'i = x; 

6. a((3x) = (a(3)x; 

7. (a + (3)x = ax + (3x; 

8. a(x + y) = ax + a y. 

3 See e.g., G. E. Shilov, An Introduction to the Theory of Linear Spaces , translated by 
R. A. Silverman, Prentice-Hall, Inc., Englewood Cliffs, N. J. (1961), Theorem 14 and 
Corollary, pp. 48-49. 

4 By xei, we mean that the element x belongs to the set 3#. In these axioms, x, y 
and z are arbitrary elements of t#, while a and 0 are arbitrary real numbers. 
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A linear space @1 is said to be normed, if each element x e St is assigned a 
nonnegative number ||jc||, called the norm of x, such that 

1. ||x || = 0 if and only if jc = 0; 

2. ||ax|| = |a| ||*||; 

3. ||* + j|| < ||*|| + b||. 


In a normed linear space, we can talk about distances between elements, 
by defining the distance between x and y to be the quantity \\x — y\\. 

The elements of a normed linear space can be objects of any kind, e.g., 
numbers, vectors (directed line segments), matrices, functions, etc. The 
following normed linear spaces are important for our subsequent purposes: 


1. The space or more precisely ^{a,b), consisting of all continuous 
functions y(x) defined on a (closed) interval [a, b]. By addition of 
elements of ^ and multiplication of elements of ^ by numbers, we mean 
ordinary addition of functions and multiplication of functions by 



graph of the function y(x ), as 


numbers, while the norm is defined 
as the maximum of the absolute 
value, i.e., 

Ibllo = max |X*)I* 

a<x<b 

Thus, in the space the distance 
between the function y*(x) and the 
function y{x) does not exceed s if 
the graph of the function y*(x) lies 
inside a strip of width 2z (in the 
vertical direction) “bordering” the 
wn in Figure 1. 


2. The space or more precisely ^(a, b ), consisting of all functions 
X*) defined on an interval [< a , b] which are continuous and have 
continuous first derivatives. The operations of addition and multi¬ 
plication by numbers are the same as in but the norm is defined by 
the formula 


HXU = max |X*)I + max |/(x)|. 

asgxsgb 

Thus, two functions in £% 1 are regarded as close together if both the 
functions themselves and their first derivatives are close together, since 

\\y - z \h < s 

implies that 

IX*) - X*)l < S, |/(*) - z'(*)l < S 
for all a ^ x ^ b. 
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3. The space 3 n , or more precisely 3 n (a, b ), consisting of all functions 
y(x ) defined on an interval [#, b] which are continuous and have 
continuous derivatives up to order n inclusive, where n is a fixed integer. 
Addition of elements of Q) n and multiplication of elements of 3 n by 
numbers are defined just as in the preceding cases, but the norm is 
now defined by the formula 

Iblln = 2 max b (0 (*)l, 

where y\x ) = (d/dxyy(x) and j> (0) (x) denotes the function y(x) itself. 
Thus, two functions in 3 n are regarded as close together if the values 
of the functions themselves and of all their derivatives up to order n 
inclusive are close together. It is easily verified that all the axioms of a 
normed linear space are actually satisfied for each of the spaces^ 7 , @ l9 
and 3 n . 

Similarly, we can introduce spaces of functions of several variables, e.g., 
the space of continuous functions of n variables, the space of functions of n 
variables with continuous first derivatives, etc. After a norm has been 
introduced in the linear space 8 $ (which may be a function space), it is 
natural to talk about continuity of functionals defined on^: 

Definition. The functional J[y] is said to be continuous at the point 
y e 8% if for any z > 0, there is a 8 > 0 such that 

I J[y\ - J[y ]I < «, (3) 

provided that \\y — y\\ <8. 

Remark L The inequality (3) is equivalent to the two inequalities 

J[y] - J[y] > - e (4) 

and 

j[y] - J[y] < e. (5) 

If in the definition of continuity, we replace (3) by (4), J[y ] is said to be lower 
semicontinuous at y , while if we replace (3) by (5), J[y] is said to be upper 
semicontinuous at y . These concepts will be needed in Chapter 8. 

Remark 2. At first, it might appear that the space which is the largest 
of those enumerated, would be adequate for the study of variational problems. 
However, this is not the case. In fact, as already mentioned, one of the basic 
types of functionals considered in the calculus of variations has the form 

J{y\ = f Fix, y, /) dx. 

da 

It is easy to see that such a functional (e.g., arc length) will be continuous if 
we interpret closeness of functions as closeness in the space 3)^ However, 
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in general, the functional will not be continuous if we use the norm intro¬ 
duced in the space ^, 5 even though it is continuous in the norm of the space 
8$ Since we want to be able to use ordinary analytic methods, e.g., passage 
to the limit, then, given a functional, it is reasonable to choose a function 
space such that the functional is continuous. 

Remark 3. So far, we have talked about linear spaces and functionals 
defined on them. However, in many variational problems, we have to deal 
with functionals defined on sets of functions which do not form linear spaces. 
In fact, the set of functions (or curves) satisfying the constraints of a given 
variational problem, called the admissible functions (or admissible curves ), 
is in general not a linear space. For example, the admissible curves for the 
“simplest” variational problem (see Sec. 4) are the smooth plane curves 
passing through two fixed points, and the sum of two such curves does not 
pass through the two points. Nevertheless, the concept of a normed linear 
space and the related concepts of the distance between functions, continuity 
of functionals, etc., play an important role in the calculus of variations. A 
similar situation is encountered in elementary analysis, where, in dealing 
with functions of n variables, it is convenient to use the concept of an 
^-dimensional Euclidean space S n9 even though the domain of definition of 
a function may not be a linear subspace of $ n . 

3. The Variation of a Functional. A Necessary Condition 
for an Extremum 

3.1. In this section, we introduce the concept of the variation (or 
differential) of a functional, analogous to the concept of the differential of a 
function of n variables. The concept will then be used to find extrema of 
functionals. First, we give some preliminary facts and definitions. 

Definition. Given a normed linear space 8/t, let each element he& 
be assigned a number 9 [A], i.e. 9 let <p[A] be a functional defined on 8 %. Then 
9 [h] is said to be a (< continuous ) linear functional if 

1. 9 [aA] = oi(p[h]for any he 8# and any real number a; 

2. 9 [A X + h 2 ] = 9 [AJ + <p[h 2 ]for any h l9 h 2 e8%\ 

3. 9 [A] is continuous (for all A e8%). 

Example 1. If we associate with each function h(x) e ^(a, b) its value at 
a fixed point x 0 in [a, b] 9 i.e., if we define the functional 9 [A] by the formula 

9 [A] = A(* 0 ), 

then 9 [A] is a linear functional on ^(a, b ). 


5 Arc length is a typical example of such a functional. For every curve, we can find 
another curve arbitrarily close to the first in the sense of the norm of the space whose 
length differs from that of the first curve by a factor of 10, say. 
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Example 2. The integral 

rb 

9 [h] = h(x) dx 

Ja 

defines a linear functional on &(a, b). 

Example 3. The integral 

C b 

cp[h\ = oi(x)h(x) dx , 

•'a 

where a(x) is a fixed function in ^?( a, b ), defines a linear functional on a , b ). 

Example 4. More generally, the integral 

cp[h] = f [<x 0 (x)b(x) + + • ■ • + a n (x)/* (n) (*)] </*, (6) 

J d 

where the a t (x) are fixed functions in ^(a, b), defines a linear functional 
on 3 } n (a, b). 

Suppose the linear functional (6) vanishes for all h(x) belonging to some 
class. Then what can be said about the functions a^x)? Some typical 
results in this direction are given by the following lemmas: 

Lemma 1. If <x(x) is continuous in [a, b], and if 

C b 

on(x)h(x) dx = 0 
Ja 

for every function h(x) e^?(a, b ) such that h(a) = h(b) = 0, then a(x) = 0 
for all x in [a, b]. 

Proof Suppose the function a(x) is nonzero, say positive, at some 
point in [< a , b]. Then a(x) is also positive in some interval [x l9 x 2 ] 
contained in [a, b]. If we set 

h(x) = (x - x 1 )(x 2 - x) 

for x in [x l9 x 2 ] and h(x) = 0 otherwise, then h(x) obviously satisfies 
the conditions of the lemma. However, 

f a (x)h(x) dx = f 2 a(x)(x — Xi)(x 2 — x) dx > 0, 

Ja 

since the integrand is positive (except at x 1 and x 2 ). This contradiction 
proves the lemma. 

Remark. The lemma still holds if we replace a, b ) by £d n (a, b). To 
see this, we use the same proof with 

h(x) = [(* - x 1 )(x 2 - x)] n + 1 

for x in [jc x , x 2 ] and h(x) = 0 otherwise. 
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Lemma 2 . T/'afx) is continuous in [a, b ], and if 

f a (x)h'(x) dx = 0 
Ja 

for every function h(x ) eSt-fa, b) such that h(a) = h(b) = 0, then 
a(x) = c for all x in [a, b], where c is a constant . 

Proof Let c be the constant defined by the condition 
f [a(*) -c]dx = 0, 

Ja 

and let 

h{ X ) = r Ko - c] ^ 

Ja 

so that h(x) automatically belongs to 3 )^{a, b) and satisfies the con¬ 
ditions h(a) = h(b) = 0. Then on the one hand, 

f [a(x) — c]h'(x) dx = f a (x)h\x) dx — c[h(b) — h(a)] = 0, 

J a Ja 

while on the other hand, 

f [a(x) — c]h\x) dx = f [a(x) — c ] 2 dx . 

Ja Ja 

It follows that ol(x) — c = 0, i.e., on(x) *= c, for all x in [a, b\. 

The next lemma will be needed in Chapter 8 : 

Lemma 3 . If a(x) is continuous in [a, b ] 9 and if 

f atx)h\x) dx = 0 

Ja 

for every function h(x) e 2 <fa, b) such that h(a) = h(b) — 0 and 
h’(a) = h\b) = 0, then a(x) = c 0 + c x x for all x in [a, b ] 9 where c 0 andc ± 
are constants. 


Proof 


and let 


Let c 0 and c 1 be defined by the conditions 
C b 

[a(x) — c 0 — c ± x] dx = 0, 

Ja 

f dx f [aft) - c 0 - c£i dl = 0, 

J a Ja 

h{x) = f dl, f [a(/) - c 0 - Cyt] dt, 

J a J a 


(7) 


so that h(x) automatically belongs to 2 2 {a, b) and satisfies the conditions 
h(a) — h(b ) = 0, h\a) = h\b) = 0. Then on the one hand, 


f [a(x) — c 0 — c^h'Xx) dx 
Ja 

= f a (x)h\x) dx - c 0 [h'(b ) - h\a)] - c x f xh"(x) dx 

Ja Ja 

= —c x [bh’(b) - ah’(a)] - c x [h(b ) - h{a)\ = 0 , 
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while on the other hand, 

f [<*(*) — c 0 — c^h'Xx) dx = f [a(x) — c 0 — c x x] 2 dx = 0 . 

J a J a 

It follows that a(x) — c 0 — c ± x = 0, i.e., a(x) = c 0 + c x x , for all x in 

fc b]. 

Lemma 4. If ol(x) and P(x) are continuous in [a, b] 9 and if 

f [a (x)h(x) + P(jc)/i'(^)] dx = 0 (8) 

Ja 

for every function h(x) e (a, b) such that h(a) = h(b) = 0, then (3(;c) 
is differentiable , and P'(x) = ol(x) for all x in [a, b]. 

Proof Setting 

A(x) = J*« &)dZ, 

and integrating by parts, we find that 

rb rb 

a (x)h(x) dx = — A{x)h\x) dx , 

Ja Ja 

i.e., ( 8 ) can be rewritten as 

f [-/*(» + p(jc)]A'(*) dx = 0- 

Ja 

But, according to Lemma 2, this implies that 
P(x) — A(x) = const, 
and hence by the definition of A(x ), 

P'(*) = oc(x) 9 

for all x in [a, b] 9 as asserted. We emphasize that the differentiability 
of the function P(x) was not assumed in advance. 

3.2. We now introduce the concept of the variation (or differential) of a 
functional. Let J[y] be a functional defined on some normed linear space, 
and let 

A J[h\ = J[y + h] - J[y] 

be its increment , corresponding to the increment h = h(x) of the “independent 
variable” y = ;>(*)• If y is fixed, A J[h\ is a functional of h 9 in general a 
nonlinear functional. Suppose that 

A J[h] = c ? [h] + e||A||, 

where cp[/z] is a linear functional and z -> 0 as ||/*|| -> 0. Then the functional 
J[y] is said to be differentiable , and the principal linear part of the increment 
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A J[h\ 9 i.e., the linear functional <p[/z] which differs from A J[h] by an infinitesi¬ 
mal of order higher than 1 relative to |A||, is called the variation (or differ¬ 
ential) of J[y ] and is denoted by 8 J[h]. 6 

Theorem 1. The differential of a differentiable functional is unique. 
Proof First, we note that if <p[/z] is a linear functional and if 


m 


0 


as \\h\\ —> 0 , then 9 [h] = 0 , i.e., 9 [h] = 0 for all h . 
9 [/*o] 7 ^ 0 for some h 0 ^ 0. Then, setting 


_h 0 cp[h 0 

~ n A “ H/ioll 


we see that ||A n || -> 0 as n -> 00 , but 


In fact, suppose 


lim $4 = lim 


n^[h 0 ] 
n | Ao II 


X # 0, 


contrary to hypothesis. 

Now, suppose the differential of the functional J[y\ is not uniquely 
defined, so that 

A/M = + eJI/,11, 

Ay [A] = <p 2 [h] + t 2 \\h\\, 

where cp 2 [A] and 9 2 [h] are linear functionals, and e 1( e 2 -> 0 as [h] —> 0. 
This implies 

9#] ~ ?2 [h] = s 2 ||/j|| ~ Sil|A||» 

and hence 91 [h] — 92 [h] is an infinitesimal of order higher than 1 relative 
to || A ||. But since 91 [h] — (? 2 [h] is a linear functional, it follows from the 
first part of the proof that 9 ^h] — 9 2 [h] vanishes identically, as asserted. 


Next, we use the concept of the variation (or) differential of a functional 
to establish a necessary condition for a functional to have an extremum. 
We begin by recalling the corresponding concepts from analysis. Let 
F(x l9 ..., x n ) be a differentiable function of n variables. Then ..., x n ) 
is said to have a ( relative ) extremum at the point (x x ,..., x n ) if 

A F = F(x u ...,x n )~ F(x ly ...,x n ) 


has the same sign for all points (x l9 ..., x n ) belonging to some neighborhood 
of (x l9 ..., x n ), where the extremum F(x l9 ..., x n ) is a minimum if A F ^ 0 
and a maximum if A F ^ 0 . 

Analogously, we say that the functional J[y ] has a (relative) extremum 
for y = y if J[y] — J[y ] does not change its sign in some neighborhood of 


6 Strictly speaking, of course, the increment and the variation of J[y] 9 are functionals 
of two arguments y and h 9 and to emphasize this fact, we might write A J[y\ h ] = 

+ e||A||. 
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the curve y = Subsequently, we shall be concerned with functionals 

defined on some set of continuously differentiable functions, and the functions 
themselves can be regarded either as elements of the space ^ or elements 
of the space 3)^. Corresponding to these two possibilities, we can define 
two kinds of extrema: We shall say that the functional J[y] has a weak 
extremum for y = y if there exists an s > 0 such that J[y] — J[y] has the 
same sign for all y in the domain of definition of the functional which satisfy 
the condition \\y — y\\ x < e, where || \h denotes the norm in the space Q) x . 
On the other hand, we shall say that the functional J[y\ has a strong extremum 
for y = y if there exists an* s > 0 such that J[y ] — J[y\ has the same sign 
for all y in the domain of definition of the functional which satisfy the 
condition \\y — y || 0 < e, where || || 0 denotes the norm in the spacfe *€. 

It is clear that every strong extremum is simultaneously a weak extremum, 
since if \\y — y\\i < e, then \\y — y\\ 0 < s, a fortiori , and hence, if J[y] is 
an extremum with respect to all y such that \\y — y || 0 < s, then J[y] is 
certainly an extremum with respect to all y such that \\y — y\h < z. How¬ 
ever, the converse is not true in general, i.e., a weak extremum may not be a 
strong extremum. As a rule, finding a weak extremum is simpler than 
finding a strong extremum. The reason for this is that, the functionals 
usually considered in the calculus of variations are continuous in the norm 
of the space (as noted at the end of the previous section), and this con¬ 
tinuity can be exploited in the theory of weak extrema. In general, however, 
our functionals will not be continuous in the norm of the space c €. 

Theorem 2 . A necessary condition for the differentiable functional 
J[y ] to have an extremum for y = y is that its variation vanish for y = y, 
i.e., that 

Sy[/z] = 0 

for y = y and all admissible h. 

Proof. To be explicit, suppose J[y ] has a minimum for y = y. 
According to the definition of the variation 8J[h], we have 

A J[h] = 8 J[h] + e ||A||, (9) 

where e->0 as ||/t|| -> 0. Thus, for sufficiently small ||A||, the sign of 
A J[h] will be the same as the sign of 8 /[/j]. Now, suppose that 
0 for some admissible h 0 . Then for any a > 0, no matter 
how small, we have 

8/[ — onh 0 ] = — 8y[a/j 0 ]. 

Hence, (9) can be made to have either sign for arbitrarily small ||A||. 
But this is impossible, since by hypothesis J[y] has a minimum for y = y, 
i.e., 

A J[h] = J[y + h] - J[y] > 0 

for all sufficiently small ||/j||. This contradiction proves the theorem. 
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Remark. In elementary analysis, it is proved that for a function to have 
a minimum, it is necessary not only that its first differential vanish (df = 0 ), 
but also that its second differential be nonnegative. Consideration of the 
analogous problem for functionals will be postponed until Chapter 5. 


4 . The Simplest Variational Problem. Euler’s Equation 

4 . 1 . We begin our study of concrete variational problems by considering 
what might be called the “simplest” variational problem, which can be 
formulated as follows: Let F(x,y , z ) be a function with continuous first and 
second ( partial ) derivatives with respect to all its arguments. Then , among 
all functions y(x) which are continuously differentiable for a < x ^ b and 
satisfy the boundary conditions 

y(a) = A, y{b) = B , ( 10 ) 

find the function for which the functional 

J[y]= f F(x,y,y')dx (11) 

has a weak extremum. In other words, the simplest variational problem 
consists of finding a weak extremum of a functional of the form ( 11 ), where 
the class of admissible curves (see p. 8 ) consists of all smooth curves joining 
two points. The first two examples on pp. 2, 3, involving the brachistochrone 
and the shortest distance between two points, are variational problems of 
just this type. To apply the necessary condition for an extremum (found in 
Sec. 3.2) to the problem just formulated, we have to be able to calculate the 
variation of a functional of the type (11). We now derive the appropriate 
formula for this variation. 

Suppose we give y(x) an increment h(x ), where, in order for the function 

y(x) + h(x) 

to continue to satisfy the boundary conditions, we must have 

h(a) = h(b) = 0. 

Then, since the corresponding increment of the functional (11) equals 
A J = J[y + h] - J[y\ = f F(x, y + h,y' + h') dx - f F(x, y, y') dx 

Ja Ja 

= f [^(*> y + h,y' + h') - F(x, y, /)] dx, 

Ja 

it follows by using Taylor’s theorem that 

A7 = f [F y {x, y, y')h + F y .(x, y, y')h'] dx + ■ ■ ■, 

Ja 


(12) 
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where the subscripts denote partial derivatives with respect to the corres¬ 
ponding arguments, and the dots denote terms of order higher than 1 relative 
to h and h '. The integral in the right-hand side of (12) represents the principal 
linear part of the increment A/, and hence the variation of J[y ] is 

8/ = f [/;(*, y , y')h + F y ,(x, y, y')h'] dx. 

Ja 

According to Theorem 2 of Sec. 3.2, a necessary condition for J[y ] to have 
an extremum for y = y(x) is that 

8 / = f (F y h + Fy’h!) dx = 0 (13) 

•'a 

for all admissible h. But according to Lemma 4 of Sec. 3.1, (13) implies 
that 

F *-Tx F *' = 0 ' (14) 

a result known as Euler's equation. 1 Thus, we have proved 
Theorem 1. Let J[y ] be a functional of the form 

rb 

F(x,y,y')dx , 

defined on the set of functions y(x) which have continuous first derivatives 
in [a, b] and satisfy the boundary conditions y{a) = A, y(b) = B. Then 
a necessary condition for J[y\ to have an extremum for a given function 
y(x) is that y(x) satisfy Euler's equation 7 8 



The integral curves of Euler’s equation are called extremals. Since 
Euler’s equation is a second-order differential equation, its solution will in 
general depend on two arbitrary constants, which are determined from the 
boundary conditions y{a) = A , y(b) = B. The problem usually considered 
in the theory of differential equations is that of finding a solution which is 
defined in the neighborhood of some point and satisfies given initial con¬ 
ditions ( Cauchy's problem ). However, in solving Euler’s equation, we are 
looking for a solution which is defined over all of some fixed region and 
satisfies given boundary conditions. Therefore, the question of whether 
or not a certain variational problem has a solution does not just reduce to the 


7 We emphasize that the existence of the derivative (d/dx)Fy> is not assumed in 
advance, but follows from the very same lemma. 

8 This condition is necessary for a weak extremum. Since every strong extremum is 
simultaneously a weak extremum, any necessary condition for a weak extremum is 
also a necessary condition for a strong extremum. 
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usual existence theorems for differential equations. In this regard, we now 
state a theorem due to Bernstein, 9 concerning the existence and uniqueness of 
solutions “in the large” of an equation of the form 

y" = F(x >y ,/). (15) 

Theorem 2 {Bernstein). If the functions F , F y and F y > are continuous 
at every finite point { x, y) for any finite y\ and if a constant k > 0 and 
functions 

a = oc(x, y) ^ 0, P = P(x, y) > 0 

{which are bounded in every finite region of the plane) can be found such 
that 

F v {x,y,y') > k, \F(x, y, y')\ < a/ 2 + p, 

then one and only one integral curve of equation (15) passes through any 
two points {a. A) and {b, B) with different abscissas {a ^ b). 

Equation (13) gives a necessary condition for an extremum, but in general, 
one which is not sufficient. The question of sufficient conditions for an 
extremum will be considered in Chapter 5. In many cases, however, 
Euler’s equation by itself is enough to give a complete solution of the prob¬ 
lem. In fact, the existence of an extremum is often clear from the physical or 
geometric meaning of the problem, e.g., in the brachistochrone problem, 
the problem concerning the shortest distance between two points, etc. If in 
such a case there exists only one extremal satisfying the boundary conditions 
of the problem, this extremal must perforce be the curve for which the 
extremum is achieved. 

For a functional of the form 


f F(x,y,y')dx 

Ja 

Euler’s equation is in general a second-order differential equation, but it 
may turn out that the curve for which the functional has its extremum is 
not twice differentiable. For example, consider the functional 

j[y] = y 2 (2x - yr dx, 

where 

X-i) = o, j(i) = i. 


The minimum of J[y ] equals zero and is achieved for the function 


y = 



for 

for 


— 1 ^ x ^ 0, 
0 < x ^ 1, 


9 S. N. Bernstein, Sur les equations du calcul des variations , Ann. Sci. Ecole Norm. 
Sup., 29, 431-485(1912). 
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which has no second derivative for x = 0. Nevertheless, y(x ) satisfies 
the appropriate Euler equation. In fact, since in this case 

F(x,y,y') = y\2x-y')\ 

it follows that all the functions 

F y = 2y(2x - y’)\ F y . = -2y\2x - /), ^F y . 

vanish identically for — 1 ^ x ^ 1. Thus, despite the fact that Euler’s 
equation is of the second order and y"(x) does not exist everywhere in 
[—1, 1], substitution of y(x) into Euler’s equation converts it into an identity. 

We now give conditions guaranteeing that a solution of Euler’s equation 
has a second derivative: 


Theorem 3. Suppose y = y(x) has a continuous first derivative and 
satisfies Euler's equation 



Then , if the function F{x , y , y') has continuous first and second derivatives 
with respect to all its arguments , y(x) has a continuous second derivative 
at all points (x, y) where 

F y . y .[x, y(x), /(*)] / 0. 

Proof Consider the difference 

A Fy = Fy(x + Ax, j_+ Ay, / + A/) - F y .{x, y, /) 

= AxF yx + AyF y . y + Ay'F y - y -, 

where the overbar indicates that the corresponding derivatives are evalu¬ 
ated along certain intermediate curves. We divide this difference by 
Ax, and consider the limit of the resulting expression 

F 4-^If 4 .^.F 

Fy ' x + Ax Fy ' y + Ax Fy ' y ' 


as Ax -> 0. (This limit exists, since /y has a derivative with respect to 
x, which, according to Euler’s equation, equals F y .) Since, by hypoth¬ 
esis, the second derivatives of i^x, y , z) are continuous, then, as 
Ax -> 0, Fy' X converges to /y z , i.e., to the value of d 2 F/dy' dx at the point 
x. It follows from the existence of y' and the continuity of the second 
derivative F y > y that the second term (Aj/Ax)/^^ also has a limit as 
Ax -> 0. But then the third term also has a limit (since the limit of the 
sum of the three terms exists), i.e., the limit 


lim 

Ax-» 0 
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exists. 


exists. 


As Ax -> 0, Fy’y’ converges to F y > y > ^ 0, and hence 


lim ^ = ?(x) 
az-*o Ax ' w 


Finally, from the equation 



we can find an expression for from which it is clear that y” is 
continuous wherever F y > y > ^ 0. This proves the theorem. 

Remark . Here it is assumed that the extremals are smooth. 10 In Sec. 15 
we shall consider the case where the solution of a variational problem may 
only be piecewise smooth , i.e., may have “corners” at certain points. 

4 . 2 . Euler’s equation (14) plays a fundamental role in the calculus of 
variations, and is in general a second-order differential equation. We now 
indicate some special cases where Euler’s equation can be reduced to a first- 
order differential equation, or where its solution can be obtained entirely 
in terms of quadratures (i.e., by evaluating integrals). 

Case 1. Suppose the integrand does not depend on y , i.e., let the functional 
under consideration have the form 


f 


F(x, /) dx , 


where Fdoes not contain y explicitly. In this case, Euler’s equation becomes 



which obviously has the first integral 

Fy = c, (16) 

where C is a constant. This is a first-order differential equation which 
does not contain y. Solving (16) for y\ we obtain an equation of the form 

y = /(*> C)> 

from which y can be found by a quadrature. 

Case 2 . If the integrand does not depend on x , i.e., if 

J[y ] = f F(y, /) dx , 

Ja 

then 

F,-±F,-F,-F,,S-F,,?. ( 17 ) 


10 We say that the function >>(■*) is smooth in an interval [a, b] if it is continuous in 
[a, Z>], and has a continuous derivative in [a, b ]. We say that >>(■*) is piecewise smooth in 
[< a , b] if it is continuous everywhere in [a, 6], and has a continuous derivative in [a, b] 
except possibly at a finite number of points. 
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Multiplying (17) by y\ we obtain 

F v y' - F y . y y' 2 - F v . y .y'f = ± (F - y'F y .). 

Thus, in this case, Euler’s equation has the first integral 

F - y'F y . = C, 

where C is a constant. 

Case 3. If F does not depend on /, Euler’s equation takes the form 

F y (x, 7 ) = 0, 

and hence is not a differential equation, but a “finite” equation, whose 
solution consists of one or more curves y = y(x). 

Case 4. In a variety of problems, one encounters functionals of the form 

f /(*, yW i + y' 2 dx, 

Ja 

representing the integral of a function f(x , y) with respect to the arc length 
s (ds = Vl + y 2 dx). In this case, Euler’s equation can be transformed 
into 


f - s (§)" -'*<*•[ /( *> r) vTttJ 


= /,Vl +y' 2 -f x 


Vi + 


775 Jv ' 


~f? 


Vi + 


Vi + y' 2 (1 + y’ 2 ) 

- f TT7\- O’ 


,'2\3/2 


i.e., 


fy -/*/ -f 


1 + y 


75 = 0. 


Example 1. Suppose that 

C 2 Vl + V' 2 

J\y] = I y( 1 ) = 0 , y( 2 ) = 1. 

Ji x 

The integrand does not contain y 9 and hence Euler’s equation has the form 
F y > = C (cf. Case 1). Thus, 


xVl + y 2 

so that 

y 2 ( 1 — C 2 * 2 ) = C 2 x 2 

, _ Cx 
J ~ V1 - C 2 JC 2 ’ 


or 
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from which it follows that 


=1 


Cx dx 


V\ - c 2 


= ^ V1 - C 2 x 2 + C! 


or 


(y - CO 2 + * 2 = ^ 


Thus, the solution is a circle with its center on the y-axis. From the 
conditions >>(1) = 0, y{2) = 1, we find that 

c = vt Cl = 2> 

so that the final solution is 

(y - 2) 2 + x 2 = 5. 

Example 2. Among all the curves joining two given points ( x 0 , y 0 ) and 
(x l5 yi),find the one which generates the surface of minimum area when rotated 
about the x-axis. As we know, the area of the surface of revolution generated 
by rotating the curve y = j(x) about the x-axis is 


27 1 f 1 yV 1 + y' 2 dx. 

Jxn 


Since the integrand does not depend explicitly on x, Euler’s equation has the 
first integral 

F - //y = C 

(cf. Case 2), i.e., 

.,'2 


yV 1 + y' 2 — y 


y 


or 


so that 


Vl + y' 2 

y = cVTTy 2 , 

ly 2 - C 2 

y =J—cs— 


= c 


Separating variables, we obtain 


dx = 


Cdy 


i.e., 


so that 


x ~h Ci — C In 


y = Ccosh 


V y 2 — C 2 
y + Vy 2 - C 2 


C 

x + Ci 


(18) 
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Thus, the required curve is a catenary passing through the two given 
points. The surface generated by rotation of the catenary is called a catenoid. 
The values of the arbitrary constants C and C 1 are determined by the 
conditions 

Ax o) = jo, Ax i) = ji- 

It can be shown that the following three cases are possible, depending on 
the positions of the points (x 0 , j 0 ) and (*i, Vi): 

1. If a single curve of the form (18) can be drawn through the points 
(•^ojJo) and (x 1 ,j>i), this curve is the solution of the problem [see 
Figure 2(a)]. 

2. If two extremals can be drawn through the points (x 0 , y 0 ) and (jc 1? >>i), 
one of the curves actually corresponds to the surface of revolution 
of minimum area, and the other does not. 

3. If there is no curve of the form (18) passing through the points (x 0 , j 0 ) 
and (*!, y x ), there is no surface in the class of smooth surfaces of revo¬ 
lution which achieves the minimum area. In fact, if the location of the 



two points is such that the distance between them is sufficiently large 
compared to their distances from the x-axis, then the area of the surface 
consisting of two circles of radius y 0 and y l9 plus the segment of the 
x-axis joining them [see Figure 2(b)] will be less than the area of any 
surface of revolution generated by a smooth curve passing through the 
points. Thus, in this case the surface of revolution generated by the 
polygonal line AxqXxB has the minimum area, and there is no surface 
of minimum area in the class of surfaces generated by rotation about the 
x-axis of smooth curves passing through the given points. (This case, 
corresponding to a “broken extremal,” will be discussed further in 
Sec. 15.) 

Example 3. For the functional 

J\y\= (\x-y) 2 dx, 

Ja 


( 19 ) 
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Euler’s equation reduces to a finite equation (see Case 3), whose solution 
is the straight line y = x. In fact, the integral (19) vanishes along this line. 

5. The Case of Several Variables 

So far, we have considered functionals depending on functions of one 
variable, i.e., on curves. In many problems, however, one encounters 
functionals depending on functions of several independent variables, i.e., on 
surfaces. Such multidimensional problems will be considered in detail in 
Chapter 7. For the time being, we merely give an idea of how the formula¬ 
tion and solution of the simplest variational problem discussed above carries 
over to the case of functionals depending on surfaces. 

To keep the notation simple, we confine ourselves to the case of two 
independent variables, but all our considerations remain the same when there 
are n independent variables. Thus, let F(x , y , z, /?, q) be a function with 
continuous first and second (partial) derivatives with respect to all its argu¬ 
ments, and consider a functional of the form 

J[z] = | Fix, y, z, z„ Zy) dx dy, (20) 

where R is some closed region and z z , z y are the partial derivatives of 
z = z(x , y). Suppose we are looking for a function z(x , y) such that 

1. z(x , y) and its first and second derivatives are continuous in R; 

2. z(x , y) takes given values on the boundary Y of R ; 

3. The functional (20) has an extremum for z = z(x , y). 

Since the proof of Theorem 2 of Sec. 3.2 does not depend on the form of 
the functional J, then, just as in the case of one variable, a necessary condition 
for the functional ( 20 ) to have an extremum is that its variation (i.e., the 
principal linear part of its increment) vanish. However, to find Euler’s 
equation for the functional ( 20 ), we need the following lemma, which is 
analogous to Lemma 1 of Sec. 3.1 (see also the remark on p. 9): 

Lemma. If a(x, y) is a fixed function which is continuous in a closed 
region R, and if the integral 

/£<*(*, y)h(x, y) dx dy (21) 

vanishes for every function h(x, y) which has continuous first and second 
derivatives in R and equals zero on the boundary Y of R, then a(x, y) — 0 
everywhere in R. 

Proof Suppose the function a(x, y) is nonzero, say positive, at 
some point in R. Then a(x, y) is also positive in some circle 

(x - x 0 ) 2 + (y - Jo) 2 < e 2 


( 22 ) 
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contained in R , with center (x 0 , j 0 ) and radius s. If we set h(x, y) = 0 
outside the circle ( 22 ) and 

h(x 9 y) = [(* ~ *o ) 2 + O' - Jo ) 2 - £ 2 ] 3 

inside the circle, then h(x ) satisfies the conditions of the lemma. How¬ 
ever, in this case, ( 21 ) reduces to an integral over the circle ( 22 ) and is 
obviously positive. This contradiction proves the lemma. 

In order to apply the necessary condition for an extremum of the functional 
(20), i.e., 8 J = 0, we must first calculate the variation 8/. Let h(x , y) be an 
arbitrary function which has continuous first and second derivatives in the 
region R and vanishes on the boundary T of R. Then if z(x 9 y) belongs to 
the domain of definition of the functional ( 20 ), so does z(x , y) + h(x, y). 
Since 

A J = J[z + h] - J[z] = J J [F(x, y 9 z + h, z x + h X9 z y + h y ) 

- F(x , y 9 z, z z , z y )] dx dy , 

it follows by using Taylor’s theorem that 

^ = J J s (FJi + F z h x + F z h y ) dxdy + -, 

where the dots denote terms of order higher than 1 relative to h 9 h x and h y . 
The integral on the right represents the principal linear part of the increment 
A/, and hence the variation of J[z ] is 

§y = (FJi + F z h t + F Zi h y ) dx dy. 

Next, we observe that 

/ L ^ F *‘ hx + Fxyhy) dx dy 

■ J J, [I (F -*> + 4 ,F ‘- h> \ dx *» ~ J J, (a ' + r y r -) hdx dy 

- L * ■ dy - F -* ■*> -IL (a ■ + Ty F -) * * *>■ 

where in the last step we have used Green’s theorem 11 

//«(If - f ) dxdy = L (pdx + QM- 

The integral along T is zero, since h(x, y) vanishes on T, and hence, comparing 
the last two formulas, we find that 

W - J1 ( f > - Ti F - - Ty f -) » d * < 23 > 


11 See e.g., D. V. Widder, Advanced Calculus , second edition, Prentice-Hall, Inc., 
Englewood Cliffs, N.J. (1961), p. 223. 
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Thus, the condition 8J = 0 implies that the double integral (23) vanishes for 
any h(x 9 y) satisfying the stipulated conditions. According to the lemma, 
this leads to the following second-order partial differential equation, again 
known as Euler's equation: 


F z - 




(24) 


We are looking for a solution of (24) which takes given values on the 
boundary T. 

Example . Find the surface of least area spanned by a given contour. 

This problem reduces to finding the minimum of the functional 


J[z] = JJ + z\ + z 2 dx dy , 


so that Euler’s equation has the form 

r(l + q 2 ) - 2spq + /(I + p 2 ) = 0, (25) 

where 

P ~ Z X9 q — Z y9 r = Z XX9 S = z xy9 t Z yy . 

Equation (25) has a simple geometric meaning, which we explain by using 
the formula 


M 


= i(! + ! 

2 \Xi x 2 


) 


Eg — 2 Ff + Ge 
2(EG - F 2 ) 


for the mean curvature of the surface, where E 9 F 9 G and e 9 f g are the 
coefficients of the first and second fundamental quadratic forms of the 
surface. 12 If the surface is given by an explicit equation of the form 
z = z(x 9 y) 9 then 

E = 1 + p 2 9 F = pq 9 G = 1 + q 2 9 

_ _ r _ _ _ s _ _ _ t_ _ 

6 ~ Vl + p 2 + q 2 ’ J ~ Vl + p 2 + q 2 ’ 8 ~ Vl + p 2 + q 2 ’ 

and hence 

M = (1 + p 2 )t - 2spq + (1 + q 2 )r 
Vl + p 2 +q 2 

Here, the numerator coincides with the left-hand side of Euler’s equation 
(25). Thus, (25) implies that the mean curvature of the required surface 
equals zero. Surfaces with zero mean curvature are called minimal surfaces. 


12 See e.g., D. V. Widder, op. cit. 9 Chap. 3, Sec. 6, and E. Kreysig, Differential 
Geometry , University of Toronto Press, Toronto (1959), Chap. 4. Here,xi and x 2 denote 
the principal normal curvatures of the surface. 
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6. A Simple Variable End Point Problem 


There are, of course, many other kinds of variational problems besides 
the “simplest” variational problem considered so far, and such problems 
will be studied in Chapters 2 and 3. However, this is a suitable place for 
acquainting the reader with one of these problems, i.e., the variable end 
point problem , a particular case of which can be stated as follows: Among all 
curves whose end points lie on two given vertical lines x — a and x = b, 
find the curve for which the functional 

•/[>']= f F{x,y,y')dx (26) 

has an extremum. 1 * 

We begin by calculating the variation 8 J of the functional (26). As 
before, 8 J means the principal linear part of the increment 

AJ = J[y + h\ - J[y] = f [F(x 9 y + h,y f + h’) - F(x , y , /)] dx. 

Ja 

Using Taylor’s theorem to expand the integrand, we obtain 
A7= C(F y h + F y h') dx + ■■■, 

where the dots denote terms of order higher than 1 relative to h and h\ and 
hence 

8/ = f (F y h + F y .h') dx. 

J a 

Here, unlike the fixed end point problem, h(x) need no longer vanish at the 
points a and b , so that integration by parts now gives 14 


U = £ (f v - A Fyj l,(x) dx + Fyh{x)\lll 

= £ jy) h(x) dx + Fy\ x = b h(b) - Fy\ x = a h(a). 


(27) 


We first consider functions h(x) such that h(a) = h(b) = 0. Then, as in 
the simplest variational problem, the condition 8J = 0 implies that 



(28) 


Therefore, in order for the curve y = y(x) to be a solution of the variable 
end point problem, y must be an extremal, i.e., a solution of Euler’s equation. 


13 The more general case where the end points lie on two given curves y = <p(*) and 
y = <]>(*) is treated in Sec. 14. 

14 As usual,/( jc)|*I* stands for f(b) - f(a). 
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But if y is an extremal, the integral in the expression (27) for 8/ vanishes, 
and then the condition 8/ = 0 takes the form 

F y -\x = bh(b) ~ F y \x=a h(a) = 0, 
from which it follows that 

Fy'\ x=a = 0, Fy. | z=t , = 0, (29) 

since h(x ) is arbitrary. Thus, to solve the variable end point problem, we 
must first find a general integral of Euler’s equation (28), and then use the 
conditions (29), sometimes called the natural boundary conditions , to determine 
the values of the arbitrary constants. 

Besides the case of fixed end points and the case of variable end points, 
we can also consider the mixed case , where one end is fixed and the other is 
variable. For example, suppose we are looking for an extremum of the 
functional (26) with respect to the class of curves joining a given point A 
(with abscissa a) and an arbitrary point of the line x = b. In this case, the 
conditions (29) reduce to the single condition 

Fy'\x = b = 

and y(a) = A serves as the second boundary condition. 


Example. Starting from the point P = ( a , A), a heavy particle slides 
down a curve in the vertical plane. Find the curve such that the particle 
reaches the vertical line x = b (^ a) in the shortest time. (This is a variant 
of the brachistochrone problem, p. 3.) 

For simplicity, we assume that the original point coincides with the origin 
of coordinates. Since the velocity of motion along the curve equals 


v = ^ = Vl + y ‘ 
dt * dt 


'2 dX 


we have 


dt = 


Vl + 


y t 


dx = 


Vl + 


y* 


v Vlgy 

so that the transit time T is given by the equation 

Vl + y' 2 


dx , 


, = r v l + > 

J Vlgy 


dx. 


The general solution of the corresponding Euler equation consists of a 
family of cycloids 

x = r(0 — sin 0) + c, y — r(l — cos 0). 
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Since the curve must pass through the origin, we must have c = 0. To 
determine r, we use the second condition 




y 

V2gy Vi + y' 2 


= 0 


for x = b. 


i.e., y = 0 for x = b , which means that the tangent to the curve at its right 
end point must be horizontal. It follows that r — b/n, and hence the 
required curve is given by the equations 

x = - (0 — sin 0), y = - (1 — cos 0). 

7C 71 


7. The Variational Derivative 

In Sec. 3.2 we introduced the concept of the differential of a functional. 
We now introduce the concept of the variational (or functional) derivative , 
which plays the same role for functionals as the concept of the partial 
derivative plays for functions of n variables. We begin by considering 
functionals of the type 

J[y]=\ F(x,y,y')dx, y(a) = A, y{b) = B, (30) 

Ja 

corresponding to the simplest variational problem. Our approach is to 
first go from the variational problem to an ^-dimensional problem, and then 
pass to the limit n -> oo. 

Thus, we divide the interval [a, b ] into n + 1 equal subintervals by 
introducing the points 

*o = a , x u ...,x n , x n+1 = b, (x i + 1 - Xi = Ax), 
and we replace the smooth function >>(•*) by the polygonal line with vertices 

(* 0 , ^0)5 C^l> ^1)5 • • • > C^n 5 yn)i (•*-n + 1 > + l)» 

where y { = jOq ). 15 Then (30) can be approximated by the sum 

• • • > a) s 2 F (*<’ y* yt+ \ x Jl ) Ax ’ 

which is a function of n variables. (Recall that y 0 = A and y n + 1 = B are 
fixed.) 

Next, we calculate the partial derivatives 

bJ(yu ...,y n ) 

8y k 

and we consider what happens to these derivatives as the number of points 
of subdivision increases without limit. Observing that each variable y k 


15 This is the method of finite differences (cf. Secs. 1, 40). 
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in (31) appears in just two terms, corresponding to i = k and i = k — 1, 
we find that 


2L_ F ( y k *i - y k \. 
dy k - Fy \ k,yk ’ A* j A 

+ Fy ^x fc _i, 


,y k -1, 


y k - y k - 

Ax 




(32) 


As Ax->0, i.e., as the number of points of subdivision increases without 
limit, the right-hand side of (32) obviously goes to zero, since it is a quantity 
of order Ax. In order to obtain a limit which is in general nonzero as 
Ax -> 0, we divide (32) by Ax, obtaining 


dj 


x k9 y k . 


8y *Ax ( 

-cM- 


y k +1 - y k \ 


Ax 


x k9 y ki 


y k +i - y k 

Ax 


) - Fy 


1 5 y k - 1 » 


yk - y k 

Ax 


=*)]• 


(33) 


We note that the expression dy k Ax appearing in the denominator on the left 
has a direct geometric meaning, and is in fact just the area of the region 
lying between the solid and the dashed curves in Figure 3. 



As Ax->0, the expression (33) converges to the limit 
^ = F y (x, y, /) - ^ F y .(x, y, /), 

called the variational derivative of the functional (30). We see that the 
variational derivative 8J/8y is just the left-hand side of Euler’s equation 
(28), and hence the meaning of Euler’s equation is just that the variational 
derivative of the functional under consideration should vanish at every point. 
This is the analog of the situation encountered in elementary analysis, where 
a necessary condition for a function of n variables to have an extremum is 
that all its partial derivatives vanish. 

In the general case, the variational derivative is defined as follows: Let 
J[y ] be a functional depending on the function >>(*)» an d suppose we give 
y(x) an increment h(x) which is different from zero only in the neighborhood 
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of a point x 0 . Dividing the corresponding increment J[y + h] — J[y ] of 
the functional by the area Aa lying between the curve y = h(x ) and the 
x-axis, 16 we obtain the ratio 


J[y + h]~ J[y ] 
A<7 


(34) 


Next, we let the area A a go to zero in such a way that both max \h(x)\ and 
the length of the interval in which h(x) is nonvanishing go to zero. Then, 
if the ratio (34) converges to a limit as Aa->0, this limit is called the 
variational derivative of the functional J[y] at the point x 0 [for the curve 
y — y ( x )L and is denoted by 


8/1 


8y\ 


|x = x 0 


It can be shown that the analogs of all the familiar rules obeyed by ordinary 
derivatives (e.g., the formulas for differentiating sums and products of func¬ 
tions, composite functions, etc.) are valid for variational derivatives. 

Remark. It is clear from the definition of the variational derivative 
that if h(x) is different from zero in a neighborhood of the point x 0 and if 
Act is the area between the curve y = h(x) and the x-axis, then 


A/ = J[y + h] - 




Aa, 


where 0 as both max \h(x)\ and the length of the interval in which h(x) 
is nonvanishing go to zero. It follows that in terms of the variational 
derivative, the differential or variation of the functional J[y ] at the point x 0 
[for the curve y = y(x)] is given by the formula 


8 / = 


8 J 
8y 


Aa. 

X = x 0 


8. Invariance of Euler’s Equation 

Suppose that instead of the rectangular plane coordinates x and y , we 
introduce curvilinear coordinates u and v y where 


x = x(u, v), 

y = y(u, v), 


x u x v 

y u y v 


# o. 


(35) 


Then the curve given by the equation y = y{x) in the xy-plane corresponds 
to the curve given by some equation 


v = v(u) 


16 Aa can also be regarded as the area between the curves y = y(x) and y = y(x) + h(x). 
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in the uv- plane. When we make the change of variables (35), the functional 

J[y\ = f F(x, y, y') dx 

da 

goes into the functional 

AM = [ 1 F [*(«, v), y(u, v), y “ ” (*« + x v v’) du 

da i L x u ~T~ X v V J 

r b i 

= Fi(u, v, v') du, 

d a\ 

where 

Fi(u, v, v') = F |x(h, v), y(u, v), ^ ^ (x u + x v v'). 

We now show that if y = y(x) satisfies the Euler equation 


dF __ d_d£ 
dy dx dy f 


(36) 


corresponding to the original functional J[y], then v = v(u ) satisfies the 
Euler equation 


dF 1 d dF ± 
dv du dv' 


(37) 


corresponding to the new functional To prove this, we use the concept 

of the variational derivative, introduced in the preceding section. Let Aa 
denote the area bounded by the curves y = y(x) and y = y(x) + h(x ), and 
let Aa! denote the area bounded by the corresponding curves v = v(u) and 
v = v(u) + y](w) in the wivplane. By the standard formula for the trans¬ 
formation of areas, the limit as Act, Ac^ 0 of the ratio Aa/Aax approaches 
the Jacobian 

x u x v 

y u y v 

which by hypothesis is nonzero. Thus, if 


lim = 0) 

Ao-» 0 ZaCT 




as well. It follows that v[u) satisfies (37) if y(x) satisfies (36). In other 
words, whether or not a curve is an extremal is a property which is independent 
of the choice of the coordinate system. 

In solving Euler’s equation, changes of variables can often be used to 
advantage. Because of the invariance property just proved, the change of 
variables can be made directly in the integral representing the functional 
rather than in Euler’s equation, and we can then write Euler’s equation for the 
new integral. 
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Example . Suppose we are looking for the extremals of the functional 


J[r] = f 1 Vr 2 + r ' 2 t/cp, 

*'<P0 

where r = r(cp). The corresponding Euler equation has the form 

r _ ± r ' = 0 . 

\Jr 2 r' 2 d<p \/ 

The change of variables 

x = r cos 9 , y = r sin 9 
transforms (38) into an integral of the form 


r 

•'Xn 


V\ + y 2 t/x, 


which has the Euler equation 
with general solution 


/ = 0 , 
y = ax + p. 


Therefore, the solution of (38) is 

r sin 9 = cur cos 9 + p. 


(38) 


PROBLEMS 

1. Use the method of finite differences (Sec. 1) to find the shortest plane curve 
joining two points A and B. 

2. A set J( in a normed linear space is said to be convex if J( contains all 
elements of the form ctx + $y 9 where a, p ^ 0 , a + p = 1 , provided that M 
contains x and y. Prove that the set of all elements x e satisfying the 
inequality \\x — Xo|| < c, where x 0 is a fixed element of & and c > 0 , is convex. 

3. Show that the set #( a , b) of all continuous functions defined on the 
interval [ a , b] 9 equipped with the norm 

r r *> '\ 1 < 2 

lb 1 = {j o w*)i 2 > 

forms a normed linear space. 

4. An infinite sequence of elements y u • • • of elements of a normed 
linear space 01 is called a Cauchy sequence (or fundamental sequence) if, given 
anye > 0, there exists an integer A = N(s) such that ||y m — y n || < e, provided 
that m > N, n > N. A normed linear space 0t is said to be complete if every 
Cauchy sequence in 0t converges to some element in 0t. Prove that the space 
^(a, b) introduced in the preceding problem is not complete, but that the space 
#( a , b) introduced in Sec. 2 is complete. 

Comment . See e.g., G. E. Shilov, op. cit., p. 249. 
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5. Prove that any norm defined on a linear space ^ is a continuous functional 
on 0t. 

6 . Suppose the norm of the space 0> n (a, b) is defined as 

Ml = max {Ij'Wl, \y'(x)\,..., |y n) (*)|}, 

a < x«s b 

instead of 

iwi = 2 max i/ f> (*)i» 

i = 0 

as on p. 7. Prove that any functional on @ n (a, b) which is continuous with 
respect to one of these norms is continuous with respect to the other. 

7. Let J[y] be the arc-length functional, defined for all b). Show 

that J[y\ is lower semicontinuous with respect to the norm of the space 
#(a, b). 

Comment. As remarked in footnote 5, p. 8, J[y] is not continuous with 
respect to the norm of ^(a, b). 

8 . Let y[h] be a linear functional defined on a normed linear space 01. Prove 
that if <p[/i] is continuous for h — 0 , it is continuous for all hs0t. 

9. Prove that a linear functional <p[/i] cannot have an extremum unless 

9 [h] EE 0. 

10. Prove that if two linear functionals <p[/i] and defined on the same 
space vanish on the same set of elements, then <p[/i] = X<J >[h] 9 where X is a 
constant. 


11. Show that constants c 0 and ci can always be chosen satisfying the 
conditions (7) used to prove Lemma 3, p. 10. 

12. Prove that the square of a differentiable functional is differentiable, and 
write a formula for its differential (variation). 

13. Prove that if two differentiable functionals defined on the same normed 
linear space have the same differential at every point of the space, then they 
differ by a constant. 

14. Analyze the variational problems corresponding to the following func¬ 
tionals, where in each case X 0 ) = 0 , X 0 = 1 • 

a) f y dx; b) f yy' dx\ c) f xyy' dx. 

Jo Jo Jo 


15. Find the extremals of the following functionals: 

r b r b v ' 2 


a) I (y 2 + y' 2 — 2y sin x) dx; b) f % dx; 

J a J a X 

c ) f (y 2 ~ y' 2 ~ 2 y cosh x) Jx; d) f (y 2 + y' 2 + 2ye x ) dx; 

J a J a 

e) f ( y 2 ~ y ' 2 ~ 2 y sin x) dx. 

Ja 


Ans. b) y = C x x 4 + C 2 ; 


d) y = \xe x + C x e x + C 2 e~ x . 
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16. Prove the uniqueness part of Bernstein’s theorem (p. 16). 

Hint. Let A(x) = <p 2 (*) — ?iW, where 9 i(x) and 92 C*) are two solutions 
of (15), write an expression for A'(x) and use the condition F y (x , y , y f ) > k. 

17. Prove that one and only one extremal of each of the functionals 

j e~ 2y2 (y' 2 — 1 ) dx, J (y 2 + y' tan -1 y' — In V 1 + y ' 2 ) dx 
passes through any two points of the plane with different abscissas. 

Hint. Apply Bernstein’s theorem. 


18. Find the general solution of the Euler equation corresponding to the 
functional 

J\y\ = f/WvTTT 5 */*, 

Ja 


and investigate the special cases /(*) = Vx and f(x) = x. 

Comment. The case /(*) = 1/a: is treated in Example 1, p. 19. 

19. Find all minimal surfaces whose equations have the form z = 9 (A) + <Ky). 

(*-*„) _ cos a(y - y 0 ) 


Ans. z = Ax + By + C, e a( *- z o 
20. Which curve minimizes the integral 


cos a(x — x 0 ) 


J o (iy /2 + yy' + y' + y) dx , 

when the values of y are not specified at the end points ? 

Ans. y = K* 2 - 3a + 1). 

21. Calculate the variational derivative at the point x 0 of the quadratic 
functional 

J[y] = f f K(s,t)y(s)y(t)dsdt. 

Ja Ja 

22 . Find the extremals of the functional 


J Vx 2 + y 2 Vl + /* dx. 

Hint. Use polar coordinates. 

Ans. x 2 cos a + 2 xy sin a — y 2 cos a = p, where a and p are constants. 
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FURTHER GENERALIZATIONS 


In this chapter, we consider some further generalizations of the simplest 
variational problem. These include variational problems in spaces of dimen¬ 
sion greater than two (Sec. 9), problems in parametric form (Sec. 10), 
problems involving higher derivatives (Sec. 11), and problems with subsidiary 
conditions (Sec. 12). 

9. The Fixed End Point Problem for n Unknown Functions 

Let F(x, y l9 ..., y n , z lf ..., z n ) be a function with continuous first and 
second (partial) derivatives with respect to all its arguments. Consider 
the problem of finding necessary conditions for an extremum of a functional 
of the form 

rb 

J[yi, • •Jn] = F(x,y 1 ,...,y n ,y’ 1 ,...,y n )dx, (1) 

•'a 

which depends on n continuously differentiable functions y^x), ..., j n (x) 
satisfying the boundary conditions 

yid) = A i9 y t (b) = B t O' = 1,. .., n). (2) 

In other words, we are looking for an extremum of the functional (1) defined 
on the set of the set of smooth curves joining two fixed points in (n + 1)- 
dimensional Euclidean space S > n + 1 . The problem of finding geodesics , i.e., 
shortest curves joining two points of some manifold, is of this type. The 
same kind of problem arises in geometric optics, in finding the paths along 
which light rays propagate in an inhomogeneous medium. In fact, according 
to Fermat's principle , light goes from a point P 0 to a point P x along the 
path for which the transit time is the smallest. 

34 
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To find necessary conditions for the functional (1) to have an extremum, 
we first calculate its variation. Suppose we replace each y^x) by a “varied” 
function + hfx). By the variation 8J of the functional J[y l9 .. y n ], 
we mean the expression which is linear in h u h[ (i = 1 ,.. n) and differs 
from the increment 

A/ = J[ yi + h l9 ..., y n + A B ] ~ J[yi, • • • > y n ] 

by a quantity of order higher than 1 relative to h\ (i = 1,..., n ). Since 
both yi(x) and y t (x) + h t (x) satisfy the boundary conditions ( 2 ), for each /, 
it is clear that 

h t (a) = hi(b) = 0 (/= 1 

We now use Taylor’s theorem, obtaining 

C b 

A J = [F(x,..., Ji + h„ y\ + h u ...) dx - F(x,..., y„ y t , ...)] dx 

da 

= f y (F yi h, + F yi h'i) dx + ■■■, 

J a 

where the dots denote terms of order higher than 1 relative to h h h\ 
(i = 1 The last integral on the right represents the principal 
linear part of the increment A/, and hence the variation of J[y l9 .. y n ] is 

87 = f J (FyJh + Wddx. 

Since all the increments h^x) are independent, we can choose one of them 
quite arbitrarily (as long as the boundary conditions are satisfied), setting 
all the others equal to zero. Therefore, the necessary condition 8 / = 0 for 
an extremum implies 

f (Fyfa + Fyfii) dx = 0 (i = 1 ,..., n). 

da 

Using Lemma 4 of Sec. 3.1, we obtain the following system of Euler 
equations : 

7*-^ = ° 0=1,(3) 

Since (3) is a system of n second-order differential equations, its general 
solution contains In arbitrary constants, which are determined from the 
boundary conditions (2). Thus, we have proved the following 

Theorem. A necessary condition for the curve 
yt=ylx) (i = 1,.. 
to be an extremal of the functional 
C b 

F(x,y u ...,y n ,y'u...,y’ n )dx 

da 

is that the functions y^x) satisfy the Euler equations (3). 
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Remark 1. We have just shown how to find a well-defined system of 
Euler equations (3) for every functional of the form (1). However, two 
different integrands F can lead to the same set of Euler equations. In fact, 
let 


0) = 0(x,ji,...,.y n ) 


be any twice differentiable function, and let 


jn, • • •. a) = + 2 ^ 

Then we find at once by direct calculation that 

dxW,)- V ’ 


(4) 


and hence the functionals 

rb 


rb 

F(x,y 1 ,...,y n ,y' 1 ,...,y' n )dx 

Ja 


(5) 


and 


rb 

[F(x,y 1 ,...,y n ,y' 1 ,...,y n )+'¥{x,y u ...,y n ,y 1 ,...,y' n )\dx (6) 

Ja 


lead to the same system of Euler equations. 

Given any curve y t = ^(x), the function (4) is just the derivative 

yi(x), ■ ...aW]. 

Therefore, the integral 

£ T(x, y ly .. .,y n ,y[,..y' n ) dx = £ ^ dx 


takes the same value along all curves satisfying the boundary conditions (2). 
In other words, the functionals (5) and (6), defined on the class of functions 
satisfying (2), differ only by a constant. In particular, we can choose O 
in such a way that this constant vanishes (but T* =£ 0). 

Remark 2. Two functionals are said to be equivalent if they have the 
same extremals. According to Remark 1, two functionals of the form (1) 
are equivalent if their integrands differ by a function of the form (4). It is 
also clear that two functionals of this form are equivalent if their integrands 
differ by a constant factor c ^ 0. More generally, the functional (5) is 
equivalent to the functional (6) with F replaced by cF. 

Example 1. Propagation of light in an inhomogeneous medium. Suppose 
that three-dimensional space is filled with an optically inhomogeneous 
medium, such that the velocity of propagation of light at each point is some 
function v{x , y , z) of the coordinates of the point. According to Fermat’s 
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principle (see p. 34), light goes from one point to another along the curve 
for which the transit time of the light is the smallest. If the curve joining 
two points A and B is specified by the equations 

y = j(*)> z = z(x), 


the time it takes light to traverse the curve equals 


r y\ +. / 2+ x 2 dx . 

Ja v(x , y , z) 

Writing the system of Euler equations for this functional, i.e., 

dv V i + y 2 + z' 2 d_ _ y _ = n 

dy v 2 dx i + y 2 + z ' 2 

dv Vl + y' 2 + z' 2 d_ _ t!_ _ _ n 

dz v 2 + dx v \/\ + y 2 + z ' 2 ’ 


we obtain the differential equations for the curves along which the light 
propagates. 

Example 2. Geodesics. Suppose we have a surface a specified by a vector 
equation 1 

r = r(w, v). (7) 

The shortest curve lying on a and connecting two points of a is called the 
geodesic connecting the two points. Clearly, the equations for the geodesics 
of cr are the Euler equations of the corresponding variational problem, i.e., 
the problem of finding the minimum distance (measured along a) between 
two points of a. 

A curve lying on the surface (7) can be specified by the equations 
u = u(t ), v = v(t). 

The arc length between the points corresponding to the values t 1 and t 2 
of the parameter t equals 

J[u 9 v]= f 1 a/ Eu' 2 + 2Fuv f + Gv' 2 dt , (8) 

J to 

where E, F and G are the coefficients of the first fundamental (quadratic) 
form of the surface (7), i.e., 2 

E = r u .r u , F = r u .r v , G = r v .r v . 


1 Here, vectors are indicated by boldface letters, and a*b denotes the scalar product 
of the vectors a and b. 

2 See D. V. Widder, op. cit. y p. 110. 



CHAP. 2 


38 FURTHER GENERALIZATIONS 

Writing the Euler equations for the functional ( 8 ), we obtain 

E u ii' 2 + 2F u u'v' + G u v' 2 _ d_ 2(Eu' + Fv') 

VEu’ 2 + IFu’v’ + Gv’ 2 ~ dt VEu' 2 + IFu'v' + Gv' 2 
E v u' 2 + 2F v u'v' + G v v' 2 _ d 2(Fu' + Gv') 

VEu' 2 + 2Fu'v' + Gv' 2 dt VEu' 2 + IFu'v' + Gv' 2 

As a very simple illustration of these considerations, we now find the 
geodesics of the circular cylinder 

r = (a cos 9 , a sin 9 , z), (9) 

where the variables 9 and z play the role of the parameters u and v . Since 
the coefficients of the first fundamental form of the cylinder (9) are 

E=a 2 , F= 0, G = l, 

the geodesics of the cylinder have the equations 


a 2 9 ' 

= 0 

d z' 

0 

V a 2 9' 2 + z ' 2 


dt Vay 2 + z ' 2 


a 2 9' 

= r\ 

z' 

Co 

Vay 2 + z ' 2 


Vay 2 + Z' 2 

v -' 2 ‘ 


Dividing the second of these equations by the first, we obtain 

dz 

d ^ ~ Cl ’ 

which has the solution 

z = ^9 + c 2 , 

representing a two-parameter family of helical lines lying on the cylinder (9). 

The concept of a geodesic can be defined not only for surfaces, but also 
for higher-dimensional manifolds. Clearly, finding the geodesics of an 
^-dimensional manifold reduces to solving a variational problem for a 
functional depending on n functions. 


10. Variational Problems in Parametric Form 

So far, we have considered functionals of curves given by explicit equations, 
e.g., by equations of the form 

y = Ax) (10) 

in the two-dimensional case. However, it is often more convenient to 
consider functionals of curves given in parametric form, and in fact we have 
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already encountered this case in Example 2 of the preceding section (involving 
geodesics on a surface). Moreover, in problems involving closed curves 
(like the isoperimetric problem mentioned on p. 3), it is usually impossible 
to get along without representing the curves in parametric form. Thus, 
in this section, we extend our previous results to the case where the curves 
are given parametrically, confining ourselves to the simplest variational 
problem. 

Suppose that in the functional 

f 1 F(x, y, y') dx, (11) 

J X 0 

we wish to regard the argument y as a curve which is given in parametric 
form, rather than in the form (10). Then (11) can be written as 

jj F |x(0> y(t), x(t) dt = JJ <D(x, y, x, y) dt (12) 

(where the overdot denotes differentiation with respect to t), i.e., as a 
functional depending on two unknown functions x(t) and y(t). The 
function O appearing in the right-hand side of ( 12 ) does not involve t 
explicitly, and is positive-homogeneous of degree 1 in x(t) and y(t), which 
means that 

0(x, y, Xx, Xy) = XO(x, y, x , y) (13) 

for every X > 0. 3 
Conversely, let 

fh 

®(x, y) dt 

Jt 0 

be a functional whose integrand O does not involve t explicitly and is positive- 
homogeneous of degree 1 in x and y. We now show that the value of such 
a functional depends only on the curve in the xy-plane defined by the para¬ 
metric equations x = x(t ), y = y(t ), and not on the functions x(t ), y(t) 
themselves, i.e., that if we go from t to some new parameter t by setting 

t = ;( t ), 

where dt/dx > 0 and the interval [t 0 , /,] goes into [t 0 , tJ, then 

C ® (*’ r - S' s) * " £ * *’» *• 


3 The example of the arc-length functional 



whose value does not depend on the direction in which the curve x = x(t), y = y(t) is 
traversed, shows why (13) does not hold for X < 0. 
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In fact, since O is positive-homogeneous of degree 1 in x and y , it follows 
that 


C ® (*■ y ’ S’ s) * - C * (*• ''^7' s) * 

= £ 1 <I>(x, y, x, y)j^dx = jj ®(x, y, x, y) dt, 
as asserted. Thus, we have proved the following 


Theorem. A necessary and sufficient condition for the functional 

f 1 0(f, x, y, x, y) dt 
h 0 

to depend only on the curve in the xy-plane defined by the parametric 
equations x — x(t) 9 y = y(t) and not on the choice of the parametric 
representation of the curve , is that the integrand O should not involve 
t explicitly and should be a positive-homogeneous function of degree 1 in 
x and y. 

Now, suppose some parameterization of the curve y = y(x) reduces the 
functional ( 11 ) to the form 

J t 1 F y, fjxdt = J ( 1 ®(x, y, x, y) dt. (14) 


The variational problem for the right-hand side of (14) leads to the pair of 
Euler equations 


O _ _ <p, = 0 
x dt * ’ 




(15) 


which must be equivalent to the single Euler equation 



= 0 , 


corresponding to the variational problem for the original functional ( 11 ). 
Hence, the equations (15) cannot be independent, and in fact it is easily 
verified that they are connected by the identity 


x 



= 0 . 


(16) 


We shall discuss this point further in Sec. 37.5. 


II. Functionals Depending on Higher-Order Derivatives 

So far, we have considered functionals of the form 

rb 

F(x,y,y')dx, 
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depending on the function >>(•*) and its first derivative y'(x ), or of the more 
general form 

C b 

F(x,y 1 ,...,y n ,y' 1 ,...,y' n )dx, 

da 

depending on several functions ji(x) and their first derivatives >>'(*). How¬ 
ever, many problems (e.g., in the theory of elasticity) involve functionals 
whose integrands contain not only >>*(*) and >>'(*), but also higher-order 
derivatives y-(x ), yf(x),... The method given above for finding extrema 
of functionals (in the context of necessary conditions for weak extrema) can 
be carried over to this more general case without essential changes. For sim¬ 
plicity, we confine ourselves to the case of a single unknown function y(x). 

Thus, let F(x , y, z l9 ..., z n ) be a function with continuous first and second 
(partial) derivatives with respect to all its arguments, and consider a 
functional of the form 

J[y] = f F(x, y,y (n> ) dx. (17) 

da 

Then we pose the following problem: Among all functions y(x) belonging to 
the space 3) n {a 9 b) and satisfying the conditions 

y(a) = A 0 , y'{a) = A x , .. /"'“(a) = A n _ x , 

y(b) = B 0 , y\b) = = B n _ ly ^ 

find the function for which (17) has an extremum. To solve this problem, we 
start from the general result which states that a necessary condition for a 
functional J[y] to have an extremum is that its variation vanish (Theorem 2, 
p. 13). Thus, suppose we replace>>(*) by the “varied” function y(x) + h(x ), 
where h(x) 9 like y(x), belongs to @ n (a, 6). 4 By the variation 8 J of the 
functional J[y] 9 we mean the expression which is linear in h 9 h\... 9 h in \ 
and which differs from the increment 


A/ = J[y + h] - J[y] 

by a quantity of order higher than 1 relative to h 9 h\... 9 h (n \ Since both 
>>(*) and >>(*) + h(x) satisfy the boundary conditions (18), it is clear that 

h(a) = h'(a) = • • • = /i (n_1) («) = 0 , 

h(b) = h\b) = • • • = A<»-i>(&) = 0. K } 

Next, we use Taylor’s theorem, obtaining 

A/ = f [F(x, y + Ky' + h\... 9 y»> + h™) - F(x , y, /,..., /">)] dx 

da 

= f {F y h + Fyh' +■■■ F y ^h <n> ) dx + ■■■, 

*’a 


4 The increment h{x) is often called the variation of >>(•*)• In problems involving 
“fixed end point conditions” like (18), we often write h(x) = 
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where the dots denote terms of order higher than 1 relative to A, A', ..., A (n) . 
The last integral on the right represents the principal linear part of the 
increment A/, and hence the variation of J[y ] is 

8J = f (F y h + iyA' + • • ■ + iyn>A (n) ) dx. 

Ja 

Therefore, the necessary condition 8J = 0 for an extremum implies that 

f (Fyh + Fyh' + • • • + Fy^h™) dx = 0. (20) 

Ja 

Repeatedly integrating (20) by parts and using the boundary conditions (19), 
we find that 

£ h - I ■ F * + £ - - + <- »• £ H - 0 (21> 

for any function h which has n continuous derivatives and satisfies (19). It 
follows from an obvious generalization of Lemma 1 of Sec. 3.1 that 

- E + £ ^ ' + = °> <22> 


a result again called Euler's equation. Since (22) is a differential equation 
of order 2 n, its general solution contains 2 n arbitrary constants, which can 
be determined from the boundary conditions (18). 


Remark. This derivation of the Euler equation (22) is not completely 
rigorous, since the transition from (20) to (21) presupposes the existence 
of the derivatives 




dn r 
dx n y<n) * 


(23) 


However, by a somewhat more elaborate argument, it can be shown that 
(20) implies (22) without this additional hypothesis. In fact, the argument 
in question proves the existence of the derivatives (23), as in Lemma 4 of 
Sec. 3.1. 5 


12. Variational Problems with Subsidiary Conditions 

12.1. The isoperimetric problem. In the simplest variational problem 
considered in Chapter 1, the class of admissible curves was specified (apart 
from certain smoothness requirements) by conditions imposed on the end 
points of the curves. However, many applications of the calculus of varia¬ 
tions lead to problems in which not only boundary conditions, but also 


6 Of course, this argument is unnecessary if it is known in advance that F has contin¬ 
uous partial derivatives up to order n + 1 (with respect to all its arguments). 
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conditions of quite a different type known as subsidiary conditions (synony¬ 
mously, side conditions or constraints ) are imposed on the admissible curves. 
As an example, we first consider the isoperimetric problem , 6 which can be 
stated as follows: Find the curve y = y(x) for which the functional 

J[y ] = f F(x, y, y') dx (24) 

da 

has an extremum, where the admissible curves satisfy the boundary conditions 

y(a) = A, y(b) = B, 
and are such that another functional 

K[y] = f G(x, y, y')dx (25) 

da 

takes a fixed value /. 

To solve this problem, we assume that the functions F and G defining 
the functionals (24) and (25) have continuous first and second derivatives in 
Ia, b] for arbitrary values of y and /. Then we have 

Theorem l. 7 Given the functional 

J[y\= f F(x,y,y')dx, 

da 

let the admissible curves satisfy the conditions 

y(a) = A, y(b) = B, K[y\ = f G(x, y, /) dx = /, (26) 

da 

where K[y ] is another functional , and let J[y\ have an extremum for 
y = y(x). Then , if y — y(x) is not an extremal of K[y], there exists a 
constant X such that y = y{x) is an extremal of the functional 

f (F + XG) dx, 

da 

i.e., y = >>(*) satisfies the differential equation 

F >-i F ’ +x (°»-s°-)-°- < 27 > 

Proof Let J[y ] have an extremum for the curve y = y(x), subject to 
the conditions (26). We choose two points x ± and x 2 in the interval 


6 Originally, the isoperimetric problem referred to the following special problem 
(already mentioned on p. 3): Among all closed curves of a given length /, find the curve 
enclosing the greatest area. This explains the designation “isoperimetric” = “with the 
same perimeter.” 

7 The reader will easily recognize the analogy between this theorem and the familiar 
method of Lagrange multipliers for finding extrema of functions of several variables, 
subject to subsidiary conditions. See e.g., D. V. Widder, op. cit ., Chap. 4, Sec. 5, espe¬ 
cially Theorem 5. 
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[a, b], where x ± is arbitrary and x 2 satisfies a condition to be stated below, 
but is otherwise arbitrary. Then we give y(x) an increment 8 ± y(x) 
+ 8 2 y(X)> where 8 xy(x) is nonzero only in a neighborhood of x l9 and 
8 2 y(X) is nonzero only in a neighborhood of x 2 . (Concerning this 
notation, see footnote 4, p. 41.) Using variational derivatives, we can 
write the corresponding increment A/ of the functional J in the form 


where 


Aj ■ {fl,.., ++ {f L, + 

/»b pb 

Actj = j(*) dx, A a 2 = & 2 j(X) dx 

J a Ja 


(28) 


and z l9 s 2 0 as Aa^ Aa 2 -> 0 (see the Remark on p. 29). 
We now require that the “varied” curve 

y = y*(x) = y(x) + + 8 2 j ( x ) 

satisfy the condition 

K[y*] = K[y). 

Writing AA in a form similar to (28), we obtain 
A K = K[y *] - K[y] = 


IM 




(29) 

+ s 2 [ Aa 2 = 0, 


where si, z 2 -> 0 as Aa x , Aa 2 -> 0. Next, we choose x 2 to be a point for 
which 


8 G 


x = x 2 


# 0 . 


Such a point exists, since by hypothesis y = y(x) is not an extremal of 
the functional A. With this choice of x 2 , we can write the condition 
(29) in the form 


Act 2 


8G 


8y 

X = Xi 

8G 



x = x 2 


+ s' 


} Adi, 


(30) 


where s' -> 0 as Aa x -> 0. Setting 
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and substituting (30) into the formula (28) for Ay, we obtain 


Ay = 


{ 


8 F 

8 y 


X = Xi 



j-Aa! + zAg 19 


(31) 


where s -> 0 as Aa x -> 0. This expression for AJ explicitly involves 
variational derivatives only at x = x l9 and the increment h(x ) is now 
just 8i.yC*)> since the “compensating increment” 8 2 j(x) has been taken 
into account automatically by using the condition A K = 0. Thus, the 
first term in the right-hand side of (31) is the principal linear part of Ay, 
i.e., the variation of the functional J at the point x x is 


8 y = 




X = Xi 


} 


Aa x . 


Since a necessary condition for an extremum is that 8 J — 0, and since 
A g 1 is nonzero while x 1 is arbitrary, we finally have 

8/r ^, SG _ir d r xWr d r \ _ n 
8y + X 8y ~ Fy dx Fy ' + X \ Gj/ dx Gy ) ~ °’ 

which is precisely equation (27). This completes the proof of the 
theorem. 


To use Theorem 1 to solve a given isoperimetric problem, we first write 
the general solution of (27), which will contain two arbitrary constants in 
addition to the parameter X. We then determine these three quantities from 
the boundary conditions y(a) = A , y(b) = B and the subsidiary condition 
K[y] = /. 

Everything just said generalizes immediately to the case of functionals 
depending on several functions y l9 ..., y n and subject to several subsidiary 
conditions of the form (25). In fact, suppose we are looking for an extremum 
of the functional 

C b 

J[yu ■ ■ -,^n] = F(x,y u .. .,y n , y[,.. .,y' n )dx, (32) 

•>a 

subject to the conditions 

yt(a) = A h y t (b) = B { (/ = 1.#i) (33) 

and 

rb 

G i (x,y 1 ,...,y n ,y' 1) ...,y' n )dx = l j {j=\,...,k). (34) 

da 

In this case a necessary condition for an extremum is that 

k ( f+ % >,G ') “ s ik i F + % X ’ G ')} - 0 <35) 

The 2 n arbitrary constants appearing in the solution of the system (35), 
and the values of the k parameters X x , ..sometimes called Lagrange 
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multipliers , are determined from the boundary conditions (33) and the 
subsidiary conditions (34). The proof of (35) is not essentially different 
from the proof of Theorem 1, and will not be given here. 

12.2. Finite subsidiary conditions. In the isoperimetric problem, the 
subsidiary conditions which must be satisfied by the functions y l9 .. . 9 y n 
are of the form (34), i.e., they are specified by functionals. We now consider 
a problem of a different type, which can be stated as follows: Find the 
functions yfx) for which the functional (32) has an extremum , where the 
admissible functions satisfy the boundary conditions 

yia) = A t , ylb) = 5, (/ = 1,.. n) 

and k “finite ” subsidiary conditions (k < n) 

gfx , yu • • • > y n ) = 0 (J = 1,..., k). (36) 

In other words, the functional (32) is not considered for all curves satisfying 
the boundary conditions (33), but only for those which lie in the (n — k)- 
dimensional manifold defined by the system (36). 

For simplicity, we confine ourselves to the case n = 2, k = 1. Then we 
have 


Theorem 2. Given the functional 

c b 

J[y, A = F(x, y, z, /, z') dx, 

Ja 

let the admissible curves lie on the surface 

g(x, y, z) = 0 

and satisfy the boundary conditions 

y(a) = A l9 y(b) = B l9 

z(a) = A 2 , z(b) = B 2 , 

and moreover , let J[y\ have an extremum for the curve 
y = y(x) 9 z = z(x ). 

Then , if g y and g z do not vanish simultaneously at any point of the surface 
(38), there exists a function X(x) such that (40) is an extremal of the 
functional 

f [F + X(x)g] dx , 

Ja 

i.e., satisfies the differential equations 


(37) 

(38) 

(39) 

(40) 


Fy + kgy dx Fy ' ~ °’ 


F, + to - = °- 


(41) 


Proof. As might be expected, the proof of this theorem closely 
resembles that of Theorem 1. Let J[y , z] have an extremum for the 
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curve (40), subject to the conditions (38) and (39), and let x x be an arbi¬ 
trary point of the interval [a , b\. Then we give >>(•*) an increment 8y(*) 
and z(x ) an increment 8z(x), where both 8y(x) and 8z(x) are nonzero 
only in a neighborhood [a, (3] of x ± . Using variational derivatives, we 
can write the corresponding increment A/ of the functional J[y, z] in the 
form 

Ay ■ {fi,.„ + “W + {§fi„„ +Ea } A ““ <42> 

where 

Aa x = f 8y(*) dx, Aa 2 = f 8z(x) dx , 

Ja * a 

and Si, z 2 —^ 0 as Ac^, Aa 2 —* 0. 

We now require that the “varied” curve 

y = y*(x) = y(x) + 8j>(*), Z = Z*(x) = z(x) + 8z(*) 

satisfy the condition 8 

g(x, y*, z*) = 0. 

In view of (38), this means that 

0 = f a y*•> z *) - six, y, z)] dx = £ (g y 8y + g z 8 z) dx 

= {^i/|x = xi + ei}^®i + {Sz | x = x x + £2)^25 

where ei, z 2 -> 0 as Acr l5 A<t 2 -> 0, and the overbar indicates that the 
corresponding derivatives are evaluated along certain intermediate curves. 
By hypothesis, either g y \ x=Xl or g z | z = Zl is nonzero. If g z \ x=Xl ^ 0, we 
can write the condition (43) in the form 

Act 2 = - {^1^1 + Aao u (44) 

L^2|X = X1 ) 


where z' -> 0 as Aa x -> 0. Substituting (44) into the formula (42) for 
A/, we obtain 


A/ = 



X = Xi 



Aa x + eAtfi, 


8 The existence of admissible curves y — >>*(.*), z — z*(x ) close to the original curve 
y = y(x), z = z(x) follows from the implicit function theorem, which goes as follows: 
If the equation g(x , y, z) = 0 has a solution for x = x 0i y = y 0 , z = z 0 , if g(x, y , z) and its 
first derivatives are continuous in a neighborhood of ( x 0i y 0 , z 0 ), and if g 2 (x 0i y 0 , z 0 ) ^ 0, 
then g(x , y, z) = 0 defines a unique function z(x , y) which is continuous and differ¬ 
entiable with respect to x and y in a neighborhood of (jt 0 , j>o) and satisfies the condition 
z(x 0 , y 0 ) = z 0 . [There is an exactly analogous theorem for the case where 
g y (x 0 , ^o, z 0 ) t* 0.] Thus, if g s [x , y(jc), z(;t)] ^ 0 in a neighborhood of the point x 0 , 
we can change the curve y = y(x) to y = y*(x) in this neighborhood and then determine 
z*(j t) from the relation z*(x) = z[x , y*(jc)]. 
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where s -> 0 as Aa x 0. The first term in the right-hand side is the 
principal linear part of A/, i.e., the variation of the functional J at the 
point x x is 


8 J 



X = X\ 



Adi. 


Since a necessary condition for an extremum is that 8 J = 0, and since 
Actx is nonzero while x 1 is arbitrary, we finally have 

-± F .-*v( F _^\ = o 

8 y g s 8z v dx y g z \ z dx z ) 


or 


Sy 


dx 


F, 


(45) 


Along the curve y = j>(x), z = z(x), the common value of the ratios 
(45) is some function of x. If we denote this function by — X(x), then 
(45) reduces to precisely the system (41). This completes the proof of 
the theorem. 


Remark 1. We note without proof that Theorem 2 remains valid when 
the class of admissible curves consists of smooth space curves satisfying the 
differential equation 9 

g(x, y, z, /, z) = 0. (46) 


More precisely, if the functional J has an extremum for a curve y, subject 
to the condition (46), and if the derivatives gy, g z > do not vanish simul¬ 
taneously along y, then there exists a function X(x) such that y is an integral 
curve of the system 


where 


O -—0=0 
y dx v ’ 


0 2 - 0 2 , 
dx 


0 , 


O = F + XG. 


Remark 2. In a certain sense, we can consider a variational problem with 
a finite subsidiary condition to be a limiting case of an isoperimetric problem. 
In fact, if we assume that the condition (38) does not hold everywhere, but 
only at some fixed point 

g(x u y, z) = 0, 

we obtain a condition whose left-hand side can be regarded as a functional 
of y and z, i.e., a condition of the type appearing in the isoperimetric problem. 


9 In mechanics, conditions like (46), which contain derivatives, are called nonholonomic 
constraints , and conditions like (38) are called holonomic constraints. 
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Thus, the condition (38) can be regarded as an infinite set of conditions, 
each of which is a functional. As we have seen, in the isoperimetric problem 
the number of Lagrange multipliers X x ,..., X fc equals the number of con¬ 
ditions of constraint, In the same way, the function X(x) appearing in the 
problem with a finite subsidiary condition can be interpreted as a “ Lagrange 
multiplier for each point x.” 

Example 1. Among all curves of length l in the upper half-plane passing 
through the points ( — a, 0) and (a, 0), find the one which together with the 
interval [ — a, a] encloses the largest area . We are looking for the function 
y = y(x) for which the integral 

y[j] = f ydx 

J - a 

takes the largest value subject to the conditions 

y(-a) = y{a ) = 0, = f Vl + y' 2 dx = i. 

J - a 

Thus, we are dealing with an isoperimetric problem. Using Theorem 1, 
we form the functional 


J[y\ + Xtf[j] = f (y + X\/l + y' 2 ) dx, 

J -a 

and write the corresponding Euler equation 

d _ y 


1 + X 


which implies 


dx Vl + y' 2 


= 0, 


x -h X - 


y 


= = c x . 
..'2 1 


VI + y' 2 

Integrating (47), we obtain the equation 

(x - CO 2 + (y- C 2 ) 2 = X 2 


(47) 


of a family of circles. The values of C l9 C 2 and X are then determined from 
the conditions 

y(-a) = y(a) = 0, K[y] = /. 

Example 2. Among all curves lying on the sphere x 2 + y 2 + z 2 = a 2 and 
passing through two given points (x 0 , y 0 , z 0 ) and (*i, y l9 z^),find the one which 
has the least length. The length of the curve y = >>(x), z = z ( x ) is given by 
the integral 

f * 1 Vi + y 2 + z ' 2 dx. 

Jx 0 

Using Theorem 2, we form the auxiliary functional 
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and write the corresponding Euler equations 

d y 

dx Vl + y' 2 + z' 2 


dx Vl + y' 2 + z' 2 

Solving these equations, we obtain a family of curves depending on four 
constants, whose values are determined from the boundary conditions 

y(x o) = jo, j(*0 = ji, 

z(x 0 ) = Z 0 , Z(x t ) = z x . 

Remark . As is familiar from elementary analysis, in finding an extremum 
of a function of n variables subject to k constraints ( k < «), we can use the 
constraints to express k variables in terms of the other n — k variables. 
In this way, the problem is reduced to that of finding an unconstrained 
extremum of a function of n — k variables, i.e., an extremum subject to no 
subsidiary conditions. The situation is the same in the calculus of variations. 
For example, the problem of finding geodesics on a given surface can be 
regarded as a problem subject to a constraint, as in Example 2 of this section. 
On the other hand, if we express the coordinates x , y and z as functions of 
two parameters, we can reduce the problem to that of finding an unconstrained 
extremum, as in Example 2 of Sec. 9. 


2jX(x) - 
2za(x) — 


PROBLEMS 

1. Find the extremals of the functional 

fn/2 

J[y,z] = J o (/ 2 + z' 2 + 2yz)dx, 

subject to the boundary conditions 

AO) = 0 , y( n/2) = 1 , z(0) = 0 , z( n/2) = 1 . 

2. Find the extremals of the fixed end point problems corresponding to the 
following functionals: 

a) f 1 (/ 2 + z' 2 + y’z') dx- 

JX o 

b) f 1 (2 yz - 2y 2 + y' 2 - z' 2 ) dx. 

Jx o 

3. Find the extremals of a functional of the form 

f 1 F(y\ z') dx, 

•/X0 

given that iy y ^ 2 , - (F y > z >) 2 ^ 0 for x 0 ^ x ^ x x . 

Ans. A family of straight lines in three dimensions. 
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4. State and prove the generalization of Theorem 3 of Sec. 4.1 for functionals 
of the form 

f F(x, y u ..., y n , y'u . .., y n ) dx. 

Ja 

Hint. The condition Fyy ^ 0 is replaced by the condition det ||F y ; y jJ ^ 0. 

5. What is the condition for a functional of the form 

J 1 F(t, yi,..., y„, y’u ..., y n ) dt, 

depending on an //-dimensional curve y t = yt(x), i = 1,..., n, to be 
independent of the parameterization? 

6. Generalizing the definition of Sec. 10, we say that the function f(x u ..., x n ) 
is positive-homogeneous of degree k in Xi, ..., x n if 

/ 0*1 .x*„) = ~k k f(x 1 , 

for every X > 0. Prove the following result, known as Euler's theorem : 
If f(x i, is continuously differentiable and positive-homogeneous of 

degree k , then 



7. State and prove the converse of Euler’s theorem. 

8. Verify formula (16) of Sec. (10). 

Hint. Use Euler’s theorem. 

9. Prove that the Euler equations (15) of the variational problem in para¬ 
metric form can be written as 

y ~ + (xy - xy)<b i = 0, (a) 

where is a positive-homogeneous function of degree —3 satisfying the 
relations 

= y 2 ® i, ®±y = -xy<t> i, Ojtf = x 2 <S>!. 

Comment. Equation (a) is known as Weierstrass ’ form of the Euler 
equations. It can also be written as 

1 = O - Q> X y 

p OiOc 2 + y 2 ) 3/2> 

where p is the radius of curvature of the extremal. 

10. Prove that Weierstrass’ form of the Euler equations is invariant under 
parameter changes t = /( t), dt/dz > 0. 

11. Find the extremals of the functional 

ny] = f 1 (1 +f*)dx 9 

Jo 

subject to the boundary conditions 

Ji0) = 0, /( 0) = 1, y(\) = 1, 


yV) = l. 
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12. Find the extremals of the functional 

rnl2 

(y" 2 - y 2 + x 2 ) dx, 

JO 

subject to the boundary conditions 

>-(0) = 1 , /(0) = 0, y(n/ 2) = 0, /(*/2) = 1 . 

13. Show that the Euler equation of the functional 

f 1 F(x, y, y\y") dx 

Jxo 

has the first integral 

d 

F y > — F y » — const 

if the integrand does not depend on y 9 and the first integral 
F - /(iy - ± Jy) - y"F y " = const 
if the integrand does not depend on x. 

14. Find the curve joining the points (0, 0) and (1, 0) for which the integral 

I’ 1 y" 2 dx 
Jo 

is a minimum if 

a) /(0) = a 9 y\ 1) = b\ 

b) No other conditions are prescribed. 

15. Supply the details of the argument mentioned in the remark on p. 42. 

16. By direct calculation, without recourse to variational methods, prove 
that the isosceles triangle has the greatest area among all triangles with a 
given b^se line and a given perimeter. 

Hint. All the triangles in question have the given base line and a vertex 
lying on a certain ellipse. 

17. Find the equilibrium position of a heavy flexible inextensible cord of 
length /, fastened at its ends. 

Hint. Minimize the ordinate of the center of gravity of the cord. By 
making a suitable change of variables, reduce the problem to Example 2 of 
Sec. 4.2. 

18. Find the extremals of the functional 

J[y] = f (/ 2 + x 2 ) dx 9 

Jo 

subject to the conditions 

*0) = o, y(l) = 0, £ y 2 dx = 2. 


19. Suppose an airplane with fixed air speed v 0 makes a flight lasting T 
seconds. Along what closed curve should it fly if this curve is to enclose 
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the greatest area ? It is assumed that the wind velocity has constant direction 
and magnitude a < v 0 . 

Ans. An ellipse whose major axis is perpendicular to the wind velocity 
and whose eccentricity is a/v 0 . The velocity of the airplane is perpendicular 
to the radius vector of the ellipse. 

20. Given two points A and B in the *y-plane, let y be a fixed curve joining 
them. Among all curves of length / joining A and B , find the curve which 
together with y encloses the greatest area. 

21. Generalizing the preceding problem, suppose the .xy-plane is covered by 
a mass distribution with continuous density fx(;c, y). As before, let A and B 
be two points in the plane, and let y be a fixed curve joining them. Among 
all curves of length / joining A and B , find the curve which together with y 
bounds the region of greatest mass. 

Hint. Introduce the auxiliary function V(x, y) = J fx(;c, y) dx. Then use 
Green’s theorem and Weierstrass’ form of the Euler equations. 

22. Among all curves joining a given point (0, b) on the y-axis to a point on 
the *-axis and enclosing a given area S together with the *-axis, find the curve 
which generates the least area when rotated about the *-axis. 

Ans. The line 



where ab = 2 S. 



3 


THE GENERAL VARIATION 
OF A FUNCTIONAL 


13. Derivation of the Basic Formula 

In this section, we derive the general formula for the variation of 
a functional of the form 

J 'X 1 

F(x, y u ..., y n , y[,..., y n ) dx, (1) 

x 0 

beginning with the case where (1) depends on a single function y and hence 
reduces to 

J[y] = f 1 F(x, y, /) dx. (2) 

J X 0 

We assume that all admissible curves are smooth, but, departing from our 
previous hypothesis, we assume that the end points of the curves for which 
(2) is defined can move in an arbitrary way. By the distance between two 
curves y = j(v) and y = j*(X) is meant the quantity 

pCv, y*) = max \y - y*\ + max \y' - y*’ | + p(P 0 , P*) + p(A, P*\ (3) 

where P 0 , J° 0 * denote the left-hand end points of the curves y = y(x ), 
y = T*(x), respectively, and P u Pf denote their right-hand end points. 1 
In general, the functions y and y * are defined on different intervals I and /*. 
Thus, in order for (3) to make sense, we have to extend y and y* onto some 
interval containing both I and /*. For example, this can be done by drawing 
tangents to the curves at their end points, as shown in Figure 4. 


1 In the right-hand side of (3), p denotes the ordinary Euclidean distance. 
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Now let y — y(x ) and y = >>*(*) be two neighboring curves, in the sense 
of the distance (3), and let 2 

h(x) = j*(x) - j(x). 

Moreover, let 

Po = (*0, Jo), Pi = (*1, Jl) 

denote the end points of the curve y = y(x), while the end points of the 
curve y = < y*(jc) = y(x) + h(x) are denoted by 

p* = (*o + 8x 0 , Jo + 8jo), P* = (*i + 8*1, Jl + Sjx). 



The corresponding variation 8/ of the functional J[y ] is defined as the 
expression which is linear in h , h\ 8x 0 , 8 y 0 , 8x l9 8 y l9 and which differs from 
the increment 

A/ = J[y + h\ - J[y ] 

by a quantity of order higher than 1 relative to p (y,y + h). Since 3 
A/ = f 1 1 F(x , y + h, y + h!) dx - I 1 F(x , y , /) dx 

Jxo+6x 0 •'Xo 

= f 1 [F(x, y + h,y' + h') - F(x, y, y')] dx (4) 

+ f 1 1 F(x, y + h, y' + h') dx - f ° ° F(x, y + h,y' + h') dx, 

Jx 0 

it follows by using Taylor’s theorem and letting the symbol ~ denote equality 
except for terms of order higher than 1 relative to p(j>, y + h) that 

AJ ~ f 1 [/;(*, y, y')h + F y (x, y, y')h'] dx 

J X0 

+ F{x,y,y ')| I = I1 8xj - F(x, y, /)| I = l0 8x 0 

= £ - 4 ^'] KX)dX + + 

- ^|* = *o 8*0 - Pyh\x = x 0 , 

2 Note that it is no longer appropriate to write h(x ) = as in footnote 4, p. 41. 

In fact, in the more precise notation of Sec. 37, h(x) = 

3 Recall that we have agreed to extend X*) and XC*) linearly onto the interval 
[x 0i x i + 8*i], so that all integrals in the right-hand side of (4) are meaningful. 
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where the term containing h ' has been integrated by parts. However, it is 
clear from Figure 4 that 

h(x 0 ) ~ 8y 0 - /(*o) 
h{x i) ~ - /(*i) 8*i, 

where ~ has the same meaning as before, and hence 

V = £ [f, - ^F v .] h(x) dx + F y .|*_* x 8* + (F - Fyy')\ xmXl 8x, 
“Vl,-*. 8y 0 - (F - F y .y% =Xo 8x 0 , (5) 

or more concisely, 

r x if d ~\ x = x i x=x i 

8/= \F y — -j- Fy \ h(x) dx + iy 8y + (F - iy/) 8* 

J X 0 L ax \ x = x 0 x = x 0 

where we 

8*1*=*, = 8x„ 8j|* =X( = 8j, (f = 0, 1). 

This is the basic formula for the general variation of the functional J[y\. 
If the end points of the admissible curves are constrained to lie on the straight 
lines x = x 0 , x = x l9 as in the simple variable end point problem considered 
in Sec. 6, then 8x 0 = 8x x = 0, while, in the case of the fixed end point 
problem, 8x 0 = 8x x = 0 and 8y 0 = §yi = 0. 

Next, we return to the more general functional (1), which depends on 
n functions jy ..., y n • Since any system of n functions can be interpreted 
as a curve in (n + l)-dimensional Euclidean space <f n + 1 , we can regard (1) 
as defined on some set of curves in S > n + 1 . Paralleling the treatment just 
given for n = 1, we now calculate the variation of the functional (1) when 
there are no restrictions on the end points of the admissible curves. As 
before, we write 

hi(x) = jf O) - .y,(x) (i = l,..., n), 

where for each /, the function y?(x) is close to y t (x) in the sense of the distance 
(3). Moreover, we let 

P 0 = (*o, y°, ■ ■ •, y°). Pi = (*i, y\,..., yl) 

denote the end points of the curve y { = y { {x), i = 1while the end 
points of the curve y { = yf{x) = y t (x) + h^x), i = 1,are denoted by 

p* = (*0 + 8x 0) y° + 8jS,.. y° n + 8j°), 

P* = (*i + 8xj, y\ + Sjl,..., y\ + 8j>J), 

and once more, we extend the functions ^(x) and y*(x) linearly onto the 
interval [x 0 , x 1 + SxJ. The corresponding variation 8/ of the functional 
/[jy •. y n ] is defined as the expression which is linear in 8x 0 , 8;q and all 
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the quantities h iy h' i9 8y° i9 8y} (i = 1 and which differs from the 
increment 

A J = J[yi + h u ...,y n + h n ] -/bi, 
by a quantity of order higher than 1 relative to 

p(jl> ) + • • • + p(j», y*)- (6) 

Since 

Ay = f 1 1 F{x ,.. .,y f + h h y'i + h [,...) dx - f 1 F(x,.. .,y t ,y'„ ...)dx 

JXQ+bXQ JXq 

= f 1 [^(x.Ji + Ky'i + K,...) - F(x,.. .,yi,y'i ,...)] dx 

J X 0 

+ f 1 1 Fix,. ..,y t + hi, y' { + 

J X1 

- f ° ° F(x, ...,y, + hi, y'i + K, ...) dx, 

Jx 0 

it follows by using Taylor’s theorem and letting the symbol ~ denote 
equality except for terms of order higher than 1 relative to the quantity ( 6 ) 
that 

Ay ~ p 1 2 (F yt hi + F v -fi\) dx + F\ zmXl 8 Xl - F\ x=Zo 8x 0 
Jx o tZ 1 

= £ 2 - Tx f «) Hx)dx + 8x1 + 2 f *M*-*i 

n 

F\x = x 0 8*0 ^ F y 'ihi\ x = x 0 , 

i = 1 

where the terms containing h\ have been integrated by parts. Just as in the 
case n = 1 , we have 

hi(x 0 ) ~ - ^'(* 0 ) 8 x 0 , 

hi(x i) - 8 j t - - y'i(x i) 8x l9 

and hence 


8J ~ £ 2 { Fyt dx Fy ') hi( ~ X) dx 

+ 2^1 W + (f~% kfA I 8x t 

t=l \X = X 1 \ i= 1 / U = Xi 

-2^1 w - (f- 2-^)1 8x o, 

:=1 I X = Xq \ i= 1 / \X — Xq 

or more concisely, 

W = C| 

n x = x 1 / n \ |x = x 1 

+ 2 F *'i ^ + [f - 2 y'fy) 8x 

i=l x = x 0 \ i = l / |x = xo 

where, as before, we define 

8x\ x=Xj = 8 Xj, 8ji| x=I< = 8y{ (J = 0, 1). 


( 7 ) 
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This is the basic formula for the general variation of the functional 

We now write an even more concise formula for the variation (7), at 
the same time introducing some important new ideas, to be discussed in 
more detail in the next chapter. Let 

Pi = F v t 0'=1, •••»«)» (8) 

and suppose that the Jacobian 


d(/>l, ..-,Pn) 

3{y'i> ■ ■ ■, y'n) 


= det \\F y .y k || 


is nonzero. 4 Then we can solve the equations (8) for y [ 9 ..., y n as functions 
of the variables 


Next, we express the function F(x, y l9 .. . ,j> n ,/i, • • yh) appearing in (1) 
in terms of a new function H(x , y l9 ..., y n9 p l9 ..., p n ) related to F by the 
formula 

h = -f + 2 y'i F y, = -f + 2 y'tPh 

i = 1 i = 1 

where the y\ are regarded as functions of the variables (9). The function 
H is called the Hamiltonian (function ) corresponding to the functional 
J[yi> • • •, y n ]- In this way, we can make a local transformation (see footnote 
2, p. 68) from the “variables” x 9 y l9 ..y n9 y ' l9 ..., y' n , F appearing in (1) 
to the new quantities x, y l9 .. ., y n ,Pu .. .,p n > H 9 called the canonical 
variables (corresponding to the functional J[y u ..., >> n ]). In terms of the can¬ 
onical variables, we can write (7) in the form 

s/ - C % i F »- s) * + (|»-« s *) 

Remark. Suppose the functional J[y l9 ..., y n ] has an extremum (in a 
certain class of admissible curves) for some curve 


= jiO) 0 = i,...,«) (10) 

joining the points 

Po = (x 0> j?,..., P 1 = (x 1; y\,..., yi). 

Then, since J[y l9 ..., y n ] has an extremum for (10) compared to all admissible 
curves, it certainly has an extremum for (10) compared to all curves with 
fixed end points P 0 and P x . Therefore, (10) is an extremal, i.e., a solution 
of the Euler equations 

F yi --^F vi = 0 


By det ||a lfc || is meant the determinant of the matrix \\a lk 
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so that the integral in (7) vanishes, and we are left with the formula 

sy = f 2 ^ Sj, + (f - 2 y'.Fy) Sxl I 1 -* 1 , (11) 

L i=i \ i= i / J |x = x 0 

or in canonical variables 


8 / = 



|*=*i 


|x = x 0 


( 12 ) 


Thus, regardless of the boundary conditions defining our variable end point 
problem, the curve for which J[y u - ..Jn] has an extremum must first be 
an extremal and then satisfy the condition that ( 11 ) or ( 12 ) vanish (see 
Problem 1, p. 63). 


14. End Points Lying on Two Given Curves or Surfaces 

The first two chapters of this book have been devoted mainly to fixed 
end point problems, where the boundary conditions require that all admissible 
curves have two given end points. The only exception is the simple variable 
end point problem considered in Sec. 6 , where the end points of the admissible 
curves are free to move along two fixed straight lines parallel to the j-axis. 
We now consider a more general variable end point problem. To keep 
matters simple, we start with the case where there is only one unknown 
function. Our problem can be s f ated as follows: Among all smooth curves 
whose end points P 0 and P 1 lie on two given curves y = <p(x) and y = ^(x), 
find the curve for which the functional 

•f[j] = f 1 F(x,y,y')dx 

Jx 0 

has an extremum. For example, the problem of finding the distance between 
two plane curves is of this type, with 

F(x,y,/) = VTT7~ 2 . 

As shown in the preceding section, the general variation of the functional 
J[y] is given by formula (5). If J [y] has an extremum for the curves = j>(a), 
then, as noted at the end of Sec. 13, this curve must first of all be an 
extremal, i.e., a solution of Euler’s equation. Hence, the integral in (5) 
vanishes and we have 

SJ = F y .\ x=xi 8j x + (F - F y y)\ x „ xi 8xi 

- Fy.\ x=X0 8 j >0 - (F - Fy.y ’)| I=I0 8 x 0 , 

which must vanish if J[y ] is to have an extremum for y = j(a'). 
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Next, we observe that according to Figure 5, 

Sjo = [?'(*) + So] 8*0, = [<K(*l) + El] 8*1, 

where s 0 -> 0 as 8x 0 -> 0, and -> 0 as 8x 1 -> 0. Thus, in the present case, 
the condition 8/ = 0 becomes 


U = (F.4' + F — y'Fy)\ zmXl S*! - (iy 9 ' + F- y'F y .) |,. l0 8* 0 = 0, 

(13) 

since 8 J contains only terms of the first order in 8x 0 and Sx ± . Since the 
increments 8x 0 and 8xi are independent, (13) implies the boundary conditions 

(/v?' + F - y'Fy .)| I=I „ = 0, 

(Fy'ty + F - y'Fy) | l=I1 = 0, 
or 

[F + ( 9 ' - /)F !/ .]|x=x. = 0, 

[F + (<K - y')F y .]\ xmXl = 0, 


called the transversality conditions. The curve y = >>(*) satisfying these 
conditions is said to be a transversal of the curves y = cp(x) and y = ^(x). 

Thus, to solve this kind of variable 
end point problem, we must first 
solve Euler’s equation 



Fy dx Fy 0, 


(14) 


and then use the transversality 
conditions to determine the values 
of the two arbitrary constants 
appearing in the general solution 
of (14). 

In solving variational problems, we often encounter functionals of the 
form 


\ Zl f{x, y)y/\ + y' 2 dx. 
•'xn 


(15) 


For such functionals, the transversality conditions have a particularly simple 
appearance. In fact, in this case, 


F, 


= /(*> y) 


y' 

Vl + y' 2 


y'F 

1 + y' 2 ’ 


so that the transversality conditions become 


F + (<?' - y')F y . 
F + OK - y')Fy 


(i + yV)F A 
1 + y' 2 
(i + yy)F 
1 + y' 2 
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It follows that 


at the left-hand end point, while 

, 1 
J =-? 

at the right-hand end point, i.e., for functionals of the form (15), trans- 
versality reduces to orthogonality. 

The same kind of variable end point problem can be posed for functionals 
depending on several functions. For example, consider the following 
problem: Among all smooth curves whose end points lie on two given surfaces 
x = <p(_y, z) and x = z) find the curve for which the functional 

J *x i 

F(x, y, z, /, z') dx 

X 0 

has an extremum. Setting n = 2 in formula (7) of the preceding section, we 
obtain the general variation of the functional J[y , z]. By the same argument 
as in the case of one independent function, we find that the required curve 
y = j(x), z = z{x) must again be an extremal, i.e., satisfy the Euler equations 



The boundary conditions are now 

[F y . + g (F - y'Fy - z'F,)]U. IO = 0, 

[F , + || (F - y'F y - - z'F,)%=* 0 = 0, 

[Fy- + 8 A(F-/Fy- - Z'F ,)]| I = I1 = 0, 

[F z - + || {F - y'Fy- - z’Fg-)\\ x = Xl = 0, 

and are again called the transversality conditions. 


15. Broken Extremals. The Weierstrass-Erdmann Conditions 

So far, we have only considered functions defined for smooth curves, 
and hence we have only permitted smooth solutions of variational problems. 
However, it is easy to give examples of variational problems which have no 
solutions in the class of smooth curves, but which have solutions if we extend 
the class of admissible curves to include piecewise smooth curves. Thus, 
consider the functional 

J\y\ = £ y \i - y'Y dx, y {- 1) = o, y( i) = 1. 
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The greatest lower bound of the values of J[y] for smooth y = y(x) satisfying 
the boundary conditions is obviously zero, but it does not achieve this value 
for any smooth curve. In fact, the minimum is achieved for the curve 


y = 



for 

for 


1 ^ x ^ 0, 
0 < jc < 1, 


which has a corner (i.e., a discontinuous first derivative) at the point x = 0. 
Such a piecewise smooth extremal with corners is called a broken extremal. 

Another problem involving broken extremals has already been encountered 
in Example 2, p. 20. There it is required to find the curve joining two points 
(x 0 , Jo) and C*i, Ji) which generates the surface of least area when rotated 
about the x-axis. As already noted, if y 0 and y ± are sufficiently small 
compared to x x — x 0 , the solution of the problem is given by the broken 
extremal Ax Q x x B shown in Fig. 2(b), p. 21. This extremal consists of three 
line segments (two vertical and one horizontal) and can be included in the 
class of piecewise smooth curves if we set up the problem in parametric form. 

Guided by the above considerations, we enlarge the class of admissible 
functions, relaxing the requirement that they be smooth everywhere. Thus, 
we pose the following problem: Among all functions y(x) which are continuously 
differentiable for a ^ x ^ b except possibly at some point c (a < c < b), 
and which satisfy the boundary conditions 


y(a) = A, y(b) = B , (16) 

find the function for which the functional 

J[y ] = [ f(x, y, y') dx 

Ja 

has a weak extremum. It is clear that on each of the intervals [a, c\ and 
[c, b] the function for which J[y] has an extremum must satisfy the Euler 
equation 

07 ) 


Writing J[y ] as a sum of two functionals, i.e., 

J[y]= f F(x,y,y') dx 

da 

= f F(x, y, y 1 ) dx + f F(x, y, /) dx = /Jj] + J 2 [y], 

J a d c 


we calculate the variations and 8 J 2 of the two terms separately. The 
end points x = a, x = b are fixed, and we require that the two “pieces” 
of the function y(x) join continuously at x = c, but otherwise the point 
x = c can move freely. Using formula (5) to write 8/x and 8 J 2 , and recalling 
that y(x) is an extremal, we find that 
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= ivUc-o + (F - y'F y .)\ xmc -o 
SJ a = - Fy' | j=c + o - (F - y'Fy .)| I=c + 0 

[The condition that >>(*) be continuous at x = c implies that 8^ and 8 J 2 
involve the same increments 8 Xl and 8 y x .] At an extremum we must have 


8/ = 8/ x + 8/ 2 = 0, 


and hence 


C^l/'|i = c-0 F y >\ x ~ c + o) ^.Vl 

+ [(F ~ /^v)Uc-o - (F - y'F y ,)\ x=c+0 ] i Xl = 0. 
Since 8 Xl and 8 y 1 are arbitrary, the conditions 

Fy' 11 = C — 0 = Fy' |x = C + 0> (i^) 

(F - y'Fy ')| x=c _ 0 = (F — y'Fy ')| z=c + 0 , 


called the Weierstrass-Erdmann (corner) conditions , hold at the point c 
where the extremal has a corner. 

In each of the intervals [a, c] and [c, b], the extremal y = y( X ) must 
satisfy Euler’s equation (17), i.e., a second-order differential equation. 
Solving these two equations, we obtain four arbitrary constants, which can 
then be found from the boundary conditions (16) and the Weierstrass- 
Erdmann conditions (18). 

The Weierstrass-Erdmann conditions take a particularly simple form if 
we use the canonical variables 


p = Fy', H = -F + y'Fy' 

introduced in Sec. 13. In fact, then the conditions (18) just mean that 
the canonical variables are continuous at a point where the extremal has a 
corner. 

The Weierstrass-Erdmann conditions have the following simple geometric 
interpretation: Let x and y take fixed values, plot the value of y' along one 
coordinate axis, and plot the values of F(x , y, y') along the other. The 
result is a curve, called the indicatrix , representing F(x , y, y') as a function of 
y'. Then the first of the conditions (18) means that the tangents to the 
indicatrix at the points y'(c — 0) and y'(c + 0) are parallel, while the second 
condition, which can be written in the form 

^|x = c + o ~ ^|x = c-o = F y >y 11 = c + o ~ F y >y | x = c _o 5 
means that the two tangents are not only parallel, but in fact coincide. 


PROBLEMS 

1. Justify the application of Theorem 2, p. 13 to the case of variable end point 
problems. 
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2. Derive the formula for the general variation of a functional of the form 

r x i 

/|>] = F(x, y, /) dx + G(x 0 , y 0 , x u yj. 

Jx 0 

3. Derive the formula for the general variation of a functional of the form 

JM = I* 1 F(x,y 9 y',jr)dx. 

JXQ 

4. Find the curves for which the functional 

/•n/4 

J[y] = (y 2 - y' 2 ) dx, 

Jo 

can have extrema, given that >>(0) = 0, while the right-hand end point can 
vary along the line x = n/4. 

5. Find the curves for which the functional 

C x i V" 1 4- v' 2 

Jly ] = — L v iL dx > J<°) = 0 

Jo y 

can have extrema if 

a) The point (xi, y i) can vary along the line y = x — 5; 

b) The point (jci, y i) can vary along the circle (x - 9) 2 + y 2 = 9. 

Arts, a )y = ± VlOx - x 2 ; b) y = ± VSx - x 2 

6. Find the curve connecting two given circles in the (vertical) plane along 
which a particle falls in the shortest time under the influence of gravity. 

7. Find the shortest distance between the surfaces z = <p(x, y) and z = <K*, jO- 

8. Write the transversality conditions for the functional in Prob. 2 if the end 
points of the admissible curves y — y(x) lie on two given curves y = y(x) 
and y = <K*). 

9. Write the transversality conditions for a functional of the form 

j[y, z] = p/Cx, y, z)V\ + y' 2 + z' 2 dx 

defined for curves whose end points lie on two given surfaces z = <p(;c, y) 
and z = y )• Interpret the conditions geometrically. 

10. Find the curves for which the functional 

J[y,z] = fV 2 + z' 2 + 2 yz)dx 
J 0 

can have extrema, given that y(0) = z(0) = 0, while the point ( jci , y l9 z x ) 
can vary in the plane x — x ± . 
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11. Show that for functionals of the form 

J[y\ = f 1 f(x, y)V 1 + y' 2 e ± tan ‘ 1 »'dx, 

J*o 

the transversality conditions reduce to the requirement that the curve y = y(x) 
intersect the curves y = q>(jc) and y = <K*) [along which its end points vary] 
at an angle of 45°. 

12. Find the curves for which the functional 

Jly] = f 1 (i +y* 2 )dx 

Jo 

can have extrema, given that y(0) = 0, y'(0) = 1,X1) = 1, while y\i) can 
vary arbitrarily. 

13. Minimize the functional 

Jly] = x 213 / 2 dx, y(- 1) = - 1, *1) = 1. 

Hint. Although the extremal y = x 113 has no derivative at x = 0, it is 
easily verified by direct calculation that y = x 113 minimizes J[y]. 

14. Given an extremal y = y(x) 9 possibly only piecewise smooth, of the 
functional 


r x i 

•Jb] = F(x 9 y f y') dx, y(x 0 ) = y 0 , y(x0 = x l9 

J xo 

suppose that 

Fy'y [x, y(x\ z] ^ 0 

for all finite z. Prove that y(x) is then actually smooth, with a smooth 
derivative, in [at 0 , jci]. 

Hint. Use Theorem 3 of Sec. 4 and the geometric interpretation of the 
Weierstrass-Erdmann conditions given at the end of Sec. 15. 

15. Prove that the functional 

J tX i 

(ay' 2 + byy' + cy 2 ) dx, }<*o) = jK-Xi) = yu 

*0 

where a i=- 0, can have no broken extremals. 

16. Does the functional 

Jly ] = r y' 3 dx, y(0) = 0, y( Xl ) = 

J o 

have broken extremals ? 

17. Find the extremals of the functional 

Jly ] = f (/ - 1 ) 2 (/ + l) 2 dx, y( 0) = 0, y( 4) = 2 

•'o 

which have just one corner. 
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18. Find the curve for which the functional 

J[y] = f F(x , /) dx , y(a) = A, y(b) = B 

Ja 

has an extremum if the curve can arrive at ([b , B) only after touching a given 
curve y = <p(.x). 

19. Given a curve y = <p(;c) and two points (a, A ), ( b , 5) lying on opposite 
sides of the curve, consider the functional 

J[y} = f y, y') dx, y(a) = A, y(b) = B, 

Ja 

where F(x f y , y') = F r (x, y , jO on the side of the curve corresponding to 
(a, A), and F(x, y, /) = F 2 {x , y , /) on the side of the curve corresponding to 
( b , 5). Find the curve y = X*) for which J[y ] has an extremum. 

20. Using Fermat’s principle (pp. 34, 36), specialize the results of Probs. 18 
and 19 to functionals of the form 

f /(*> yW i + y ' 2 dx, 

da 

thereby deriving the familiar laws of reflection and refraction for light rays. 

21. Find the curves for which the functional 

Z»10 

J[y] = y' 3 dx , y( 0) = 0, .y(10) = 0 

Jo 

can have extrema, given that the admissible curves cannot penetrate the interior 
of the circle with equation 

(x - 5) 2 + y 2 = 9. 

f±ix for 0 ^ x ^ i 5 -, 

Ans. y = l ±V9 - (x - 5) 2 for ^ ^ x ^ 

L + iO: - 10) for ^ ^ x ^ 10. 
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THE CANONICAL FORM 
OF THE EULER EQUATIONS 
AND RELATED TOPICS 


As already remarked in Sec. 1, many physical laws can be expressed as 
variational principles , i.e., in terms of extremal properties of certain func¬ 
tionals. In this chapter, we shall illustrate this situation by using variational 
methods to study the classical mechanics of a system consisting of a finite 
number of particles. For example, we shall show how the trajectories in 
phase space of a mechanical system (which describe how the system evolves 
in time) can be found as the extremals of a certain functional. By using the 
calculus of variations, we can also find quantities connected with a given 
physical system which do not change as the system evolves in time. These 
and related ideas will be our chief concern here. First, we return to the 
subject of canonical variables (introduced in Sec. 13), and discuss the reduc¬ 
tion of the Euler equations to canonical form. Appendix I (p. 208) is closely 
related to the subject matter of this chapter, and contains another, independent 
derivation of the canonical equations and the Hamilton-Jacobi equation. 


16 . The Canonical Form of the Euler Equations 

The Euler equations corresponding to the functional 

C b 

J[yu ■ ■ Jn] = F(x,y u ...,y n , yu...,y'n)dx 

Ja 

67 


( 1 ) 
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(which depends on n functions) form a system of n second-order differential 
equations 

Fyi ~Tx Fy i = ° (2) 

This system can be reduced (in various ways) to a system of In first-order 
differential equations. For example, regarding y [,..., y' n as n new functions, 
independent of y l9 ..., y n , we can write (2) in the form 

d f x = X, F yi - ± F yi = 0 (3) 


where y u ..., y n , y [,..., y' n are 2 n unknown functions, and x is the independ¬ 
ent variable. 1 However, we obtain a much more convenient and symmetric 
form of the Euler equations if we replace x, y i,..., y n , y[ ,..., y' n by another 
set of variables, i.e., the canonical variables introduced in the preceding 
chapter. The reader will recall that in Sec. 13, we used the equations 

Pi — Fy\ (/ = 1, —, /i) (4) 

to write y [,..., y n as functions of the variables 2 

x,y l9 .. . 9 y n >Pu • • (5) 

Then we expressed the function F(x, y l9 ..., y n , y [,..., y' n ) appearing in 
(1) in terms of a new function H(x , y l9 ..., y n , p l9 ..., p n ) related to F by 
the formula 

H = — F + 2 y’iPt, (6) 

i = 1 

where the y\ are regarded as functions of the variables (5). The function 
H is called the Hamiltonian (corresponding to the functional J[y l9 ..., j n ]). 
Finally, we introduced the new variables 

x,yi,...,y n ,Pi,...,p n ,H, (7) 


1 In other words, here (and elsewhere in this chapter), we regard the y[ as new 
“variables.” To avoid confusion, it would be preferable to write z { instead of yl, but 
we shall adhere to the commonly accepted notation. Thus, in cases where we are con¬ 
cerned with the derivative of a function y t , we shall emphasize this fact by writing 
dyj dx instead of y\. 

2 As already noted on p. 58, in making the transition from the variables x, y u ..., y n , 
y'n . • y'n to the variables x , y u ..., y n9 P\, .. p n , we require that the Jacobian 


KPu . • .,Pn) 
d(y'n . . J'n) 


be nonzero. We shall assume that this condition is satisfied. However, it should be 
kept in mind that this condition guarantees only the local “solvability” of the equations 
(4) with respect to y[, .. .,>>n, but it does not guarantee the possibility of representing 
y'n • • y'n as functions of x , y u ..., y n , p u . .., p n which are defined over the whole 
region under discussion. Thus, all our considerations have a local character. 
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called the canonical variables (corresponding to the functional J[yi,..., >> n ]), 
which were used on p. 58 to write a concise expression for the general variation 
of the functional J[y u • • •, y n ], and on p. 63 to give a simple interpretation 
of the Weierstrass-Erdmann conditions. 

We now show how the Euler equations (3) transform when we go over to 
canonical variables. In order to make this change in the Euler equations, 
we have to express the partial derivatives F Vi (i.e., the partial derivatives of F 
with respect to y i9 evaluated for constant x 9 y' l9 ..y n ) in terms of the partial 
derivatives H Vi (evaluated for constant x 9 p l9 . . . 9 p n ). 3 The direct evaluation 
of these derivatives would be rather formidable. Therefore, to avoid lengthy 
calculations, we write the expression for the differential of the function H. 
Then, using the fact that the first differential of a function does not depend 
on the choice of independent variables (i.e., is invariant under changes 
of the independent variables), we shall obtain the required formulas quite 
easily. 

By the definition of H, we have 

dH = - dF + 2 Pi dy'i + 2 y'i d Pt> 

i= 1 i=l 

so that 


dH =-?fdx- f 
dx 


8F 

dy t 




n n 

+ 2 p* dy'i + 2 y* dpt - 


( 8 ) 


Ordinarily, before using (8) to obtain expressions for the partial derivatives 
of //, we would have to express the dy\ in terms of x, y u and p { . However 
(and this is the important feature of the canonical variables), because of the 
relations 

dF r i ^ 

d? t = P< o=i, 

the terms containing dy\ in (8) cancel each other out, and we obtain 


Thus, to obtain the partial derivatives of //, we need only write down the 
appropriate coefficients of the differentials in the right-hand side of (9), i.e., 

d Jl __ d JL d _E _ ' 

dx dx ’ dy { ~ dyl dp { ~ 

3 The notation ordinarily used in analysis to denote partial derivatives suffers from 
the familiar defect of not specifying just which variables are held fixed. 
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In other words, the quantities dF!dy { and y\ are connected with the partial 
derivatives of the function H by the formulas 


, _ dH d_F_ __ dH 

y ' ~ dp I c’Ji ~~ dy t 

Finally, using (10), we can write the Euler equations (3) in the form 


( 10 ) 


dy x _ dH dpi 

dx dpi dx 


dH 

dy t 


(i = !,...,«). 


( 11 ) 


These 2 n first-order differential equations form a system which is equivalent 
to the system (3) and is called the canonical system of Euler equations (or 
simply the canonical Euler equations) for the functional (1). 


17 . First Integrals of the Euler Equations 

It will be recalled that a first integral of a system of differential equations is 
a function which has a constant value along each integral curve of the system. 
We now look for first integrals of the canonical system (11), and hence of the 
original system (3) which is equivalent to (11). First, we consider the case 
where the function F defining the functional (1) does not depend on x 
explicitly, i.e., is of the form F(y u ..., y n , y'u • • •, Tn)- Then the function 

H = — F + J y’ iPi 

i = 1 

also does not depend on x explicitly, and hence 

dH = f IdHdy, dHdpA 

dx jtl \d yi dx + d Pi dx) y } 

Using the Euler equations in the canonical form (11), we find that (12) 
becomes 


dH 

dx 


y ( d Ji d Ji _ d JL d Jh - o 

1 Wi fyi d Pi dyj ~ : 


along each extremal. 4 Thus, if F does not depend on x explicitly, the function 
H(yu • • • 9 y n9 Pi, • • • * Pn) Is ci first integral of the Euler equations . 5 


4 If H depends on x explicitly, the formula 

dH _ dH 
dx dx 

can be derived by the same argument. 

5 Cf. the discussion in Case 2, p. 18 of the integration of Euler’s equation for 
functionals which are independent of x. 
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Next, we consider an arbitrary function of the form 
$ = ...,y n ,Pi,..., Pn), 

and we examine the conditions under which ( t> will be a first integral of the 
system (11). We drop the assumption that F does not depend on x explicitly, 
and instead we consider the general case. Along each integral curve of the 
system (11), we have 

d® 
dx 


d<& dy { 


t 4i tyi dx dp t dx 


2 6'^ 
tyt 

ao dH 


,?i 8 Pt 


SO 8 H 




= [O, H], 


where the expression 

1,1 ft <y, <v>, dp, ay, 

is called the Poisson bracket of the functions O and H. Thus, we have 
proved the formula 

d<$> 

T X = I*’ < 13 > 

It follows from (13) that a necessary and sufficient condition for a function 
O = $(>>!,..., y n , pi,..., p n ) to be a first integral of the system of Euler 
equations (11) is that the Poisson bracket [d>, H] vanish identically . 6 


18 . The Legendre Transformation 

We now consider another method of reducing the Euler equations to 
canonical form, a method which differs from that presented in Sec. 16. 
The idea of this new method is to replace the variational problem under 
consideration by another, equivalent problem, such that the Euler equations 
for the new problem are the same as the canonical Euler equations for the 
original problem. 

18.1. We begin by discussing some related topics from the theory of 
extrema of functions of n variables. First, we consider the case n = 1. 


6 According to the existence theorem for the system (11), there is an integral curve of 
the system passing through any given point (x, yi,..y n , p u ..p n ). Hence, if 
[O, H] = 0 along every integral curve, it follows that [O, H] =0. If O (as well as H) 
depends on x explicitly, it is easily verified that (13) is replaced by 
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Suppose we are looking for an extremum, say a minimum, of the function 
/(£), and suppose /(£) is (strictly) convex , which means that 


ni) > 0 

wherever /(£) is defined. We introduce a new independent variable 

P = /'©, (14) 

called the tangential coordinate , which is just the slope of the tangent passing 
through a given point of the curve rj = /(£). Since by hypothesis 

| = f'W > o, 

we can use (14) to express \ in terms of p. In fact, since the function /(£) 
is convex, any point of the curve r\ = /(£) is uniquely determined by the slope 

of its tangent (see Figure 6). Of course, the 
same is true for a (strictly) concave function, 
i.e., a function such that /"(£) < 0 everywhere. 
We now introduce the new function 

H(p) = - M) ^ pi (15) 

where i; is regarded as the function of p obtained 
by solving (14). The transformation from the 
variable and function pair i;, /(£) to the variable 
and function pair p , H(p ), defined by formulas 
(14) and (15), is called the Legendre transform¬ 
ation. It is easy to see that since /(£) is convex, so is H(p). [The convex 
functions H(p) and /(£) are sometimes said to be conjugate.] In fact, 

dH = -/'(£) dl + p di + l dp 



implies that 

(16) 

(17) 

since /"(£) > 0. Moreover, if the Legendre transformation is applied to 
the pair p , H(p ), we get back the pair /(£). This follows from (16) and 
the relation 


dH , 


and hence 


d 2 H 

dp 2 


dl 

dp 


dp 

di 


f'Xi) 


> 0 , 


-H(p) + pH'(p) = M) - pH'(p) + pH'(p) = f(i). (18) 


Thus, the Legendre transformation is an involution , i.e., a transformation 
which is its own inverse. 
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Example . If 


then 


i.e., 


It follows that 


m = | (a > 1), 
f'il) = p = zr\ 


£ = pll(a-l)' 


£a D al(a- 1) / i \ 

H = -±-+pl = - P —— + / -I + l), 

and therefore 

Hip) = £ 

where b is related to a by the formula 


1 1 

-h T — 1 * 

a b 


Next, we show that if 

-H{p) + Ip (19) 

is regarded as a function of two variables, then 

/(?) = max [-Hip) + Ip]. ^O) 

[In fact, we can use (20) instead of (15) to define the function H(p).] To 
prove this result, we note that according to (18), the function (19) reduces 
to /(£) when the condition 

Y p I-tfOO + Zp\ = ~ h '(p ) + l = o. 

or 

5 = H'ip), 

is satisfied. Thus, /(£) is an extremum of the function —H(p) + 
regarded as a function of p. Moreover, the extremum is a maximum since 

£2 [-Hip) + Ip] = -H\p) < 0 

[cf. (17)]. It follows that 

min /(£) = min max [ — H(p) + \p ], 

n $ V 

i.e., the extremum of /(£) is also an extremum of (19), regarded as a function 
of two variables. 
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Similar considerations apply to functions of several independent variables. 
Let 


be a function of n variables such that 

d et ll/^JI ^ (21) 

and let 

Pt=A (/= !,••.,»). (22) 

Then, using (22) to write % l9 ...,Z n in terms of p l9 .. .,p n , we form the 
function 

n 

H(P 1, ...,/»«)=-/+ 2 

i = l 

As in the case of one variable, it can be shown that 

fill, • • •» In) = ext [-//(/?!, . . .,/>„) + 2 Mi] 

Pi.Pn 


ext /(5i,..., In) = ext [-#(/?!, ...,p n )+ y Pili 1 > 

5l.5l.5n.Pl.Pn L J 

where ext denotes the operation of taking an extremum with respect to the 
indicated variables. In other words, the extremum of f (& l9 ..., £ n ) is also 
an extremum of 


-H(p u ...,p n )+ 2 Pili, 

i = 1 

regarded as a function of 2 n variables. 

Remark. If instead of (21), we impose the stronger condition that the 
matrix 

II/mJ 

be positive definite , i.e., that the quadratic form 

n 

2 fztlkWK 

i. k=l 

be positive for arbitrary real numbers a l5 ..., a n , 7 then 

Mu- In) = max [ - H(p lt ...,Pn) + T hp\ • (23) 

Pi Pn L iTl J 


7 This is the condition for the function /($ u ..., $ n ) to be (strictly) convex. 
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It follows from (23) that 

n 

-H(p u ...,p n )+ 2 Mi < Ah, ...,£„) 

1 = 1 


for arbitrary p l9 .. .,/? n , i.e., 

n 

2 Ml ^ #0*1, ■■•,Pn)+ Ah, • • •» ?n)» 

i = l 

a result known as Young's inequality . 

18.2. We now apply the considerations of Sec. 18.1 to functionals. Given 
a functional 

J[y]= C F{x,y,y')dx, (24) 

Ja 

we set 

P = F y {x, y, /) (25) 

and 

#(*, p) = - ^ + py'- (26) 

Here we assume that ^ 0, so that (25) defines y' as a function of x, 
y and p. Then we introduce the new functional 


J[y,P\=\ [~H(x,y,p) + py']dx, 

•la 


(27) 


where y and p are regarded as two independent functions, and y' is the deriva¬ 
tive of y. This functional is obviously the same as the original functional 
(24), if we choose p to be given by the expression (25). The Euler equations 
for the functional (27) are 


^_^ = n _^4.^ = n 
8 y dx ’ dp + dx ’ 


(28) 


i.e., just the canonical equations for the functional (24). If we can show 
that the functionals (24) and (27) have their extrema for the same curves, 
this will prove that the equation 


8 F _ d_dF_ 
8 y dx dy' 


(29) 


and the equations (28) are equivalent, thereby providing a new derivation of 
the canonical equations, independent of the derivation given in Sec. 16. 

First, we observe that the transformation from the variables x, y , y and 
the function Fto the variables x, y , p and the function //, defined by formulas 
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(25) and (26), is an involution, i.e., if we subject H(x,y,p ) to a Legendre 
transformation, we get back the function F(x , y , /). In fact, since 


dF J dF J 

dH== -te dX -d-y dy+ydp ’ 


it follows that 


and hence 


dH 

dp 


= y, 


dH 


-H + p = F - py' + py' = F. 


(30) 


[Cf. formula (9) of Sec. 16.] 

Next, we note that to prove the equivalence of the variational problems (24) 
and (27), it is sufficient to show that J[y ] is an extremum of J[y,p] when p 
is varied and y is held fixed, symbolically 

J[y] = ext J[y, p], (31) 

V 

since then an extremum of J[y,p] when both p and y are varied will be an 
extremum of J[y]. Since j\y,p] does not contain p', to find an extremum 
of J[y , p] it is sufficient to find an extremum of the integrand in (27) at every 
point (cf. Case 3, p. 19). Thus we have 


-[-H + P y'] = 0, 


from which it follows that 


But this implies (31), since' 


, dH 
y ~ 




according to (30). Thus, we have proved the equivalence of the variational 
problems (24) and (27), and of the corresponding Euler equations (28) and 
(29). Although we have only considered functionals depending on a single 
function, completely analogous considerations apply to the case of functionals 
depending on several functions. 


Example . Consider the functional 

f (Py 2 + Qy 2 ) dx, 

Ja 

where P and Q are functions of a - . In this case, 

p = 2Py', H = Py' 2 - Qy 2 . 

and hence 


( 32 ) 


TT 




,2 
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The corresponding canonical equations are 

dp _ r. n dy _ p_ 
dx dx 2 P’ 

while the usual form of the Euler equation for the functional (32) is 


W- s (W)-o. 


19 . Canonical Transformations 


Next, we look for transformations under which the canonical Euler 
equations preserve their canonical form. The reader will recall that in Sec. 8 
we proved the invariance of the Euler equation 



under coordinate transformations of the form 


u = u(x 9 y ), 
v = v(x, y\ 


u x 


Uy 

Vy 


0. 


(Such transformations change y' to dv/du in the original functional.) The 
canonical Euler equations also have this invariance property. Furthermore, 
because of the symmetry between the variables y t and p { in the canonical 
equations, they permit even more general changes of variables, i.e., we can 
transform the variables x, y i9 p { into new variables x, 


Y t = Y t (x 9 y l9 .. .,y n9 Pu • ^,Pn), 
Pi = P t (x 9 y l 9 ... 9 y n 9 p l 9 ... 9 p n ). 


(33) 


In other words, we can think of letting the p t transform according to their 
own formulas, independently of how the variables y { transform. However, 
the canonical equations do not preserve their form under all transformations 
(33). We now study the conditions which have to be imposed on the 
transformations (33) if the Euler equations are to continue to be in canonical 
form when written in the new variables, i.e., if the canonical equations are to 
transform into new equations 

dYi = dm dPi = dm 

dx dP t dx dY { ' ^ ^ 

where H* = H*(x, Y l9 ..., Y n , P l9 ..., P n ) is some new function. Trans¬ 
formations of the form (33) which preserve the canonical form of the Euler 
equations are called canonical transformations. 

To find such canonical transformations, we use the fact that the canonical 
equations 

dy t dH d Pi dH 
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are the Euler equations of the functional 

J[yu ■ ■ •, y n , Pu ■ ■ ■, p J = J o (2 pm ~ H ) dx > ( 36 > 

in which the and p x are regarded as 2 n independent functions. We want 
the new variables Y t and P t to satisfy the equations (34) for some function 
H*. This suggests that we write the functional which has (34) as its Euler 
equations. This functional is 

J*[ Y u ..., Y n , P u ..., P n ] = £ ( J- P t Y’ ( - J/*j dx, (37) 

where Y t and P t are the functions of x, y { and p t defined by (33), and Y[ 
is the derivative of Y t . Thus, the functionals (36) and (37) represent two 
different variational problems involving the same variables y t and p u and 
the requirement that the new system of canonical equations (34) be equivalent 
to the old system (35), i.e., that it be possible to obtain (34) from (35) by 
making a change of variables (33), is the same as the requirement that the 
variational problems corresponding to the functionals (36) and (37) be 
equivalent. 

In the remarks made on p. 36, it was shown that two variational problems 
are equivalent (i.e., have the same extremals) if the integrands of the corre¬ 
sponding functionals differ from each other by a total differential, which in 
this case means that 


n n 

2 Pi dy, - H dx = 2 PidYi - H* dx + d<b{x, y u ..., y n , p u ..., p n ) 

(38) 

for some function O. Thus, if a given transformation (33) from the variables 
y» Pi to the variables x, Y u P t is such that there exists a function O satis¬ 
fying the condition (38), then the transformation (33) is canonical. In this 
case, the function O defined by (38) is called the generating function of the 
canonical transformation. The function $ is only specified to within an 
additive constant, since, as is well known, a function is only specified by its 
total differential to within an additive constant. 

To justify the term “generating function,” we must show how to actually 
find the canonical transformation corresponding to a given generating 
function <I>. This is easily done. Writing (38) in the form 


we find that 8 


d<S> = 2 Pi d Yi - 2 dY * + ( H * ~ H ) dx > 

i = 1 i=1 


i = l 

i = 1 


d(D 

acD 

dO 

H* = H + — 

dx 

Wi 

‘ - y’ 


(39) 


8 O is originally a function of jc, and p t . 
as a function of the variables x, y t and Y t . 


However, by using (33), we can write O 
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Then (39) is precisely the desired canonical transformation. In fact, the 
In + 1 equations (39) establish the connection between the old variables 
y^Pi and the new variables Y u P u and they also give an expression for the 
new Hamiltonian //*. Moreover, it is obvious that (39) satisfies the condition 
(38), so that the transformation (38) is indeed canonical. If the generating 
function O does not depend on x explicitly, then H* = H. In this case, 
to obtain the new Hamiltonian H* 9 we need only replace y { and p t in H by 
their expressions in terms of Y t and P t . 9 

In writing (39), we assumed that the generating function is specified as a 
function of x, the old variables y t and the new variables Y t : 

O = <b(x 9 y l9 .. y n , Y l9 ..., Y n ). 

It may be more convenient to express the generating function in terms of 
y t and P { instead of y t and Y t . To this end, we rewrite (38) in the form 

d (*D + y P t r\ = 2 Pi dy , + 2 Y t dP t + C H* - H) dx, 
thereby obtaining a new generating function 

<d+2 p ‘ y » ( 4 °) 

i = 1 

which is to be regarded as a function of the variables x, y t and P { . Denoting 
(40) by TX*, y l9 ..., y n9 P l9 ..., P n ) 9 we can write the corresponding canon¬ 
ical transformation in the form 


Pi = ^7 


tyi 


8Y 

= w; 


ST 

H* = H + -=-■ 
ox 


(41) 


20. Noether’s Theorem 

In Sec. 17 we proved that the system of Euler equations corresponding 
to the functional 

f F(y u ...,y n ,y[,..., y' n ) dx, (42) 

da 

where F does not depend on x explicitly, has the first integral 

H = —F + 2 y'iFy’, 

1=1 

It is clear that the statement “Fdoes not depend on x explicitly” is equivalent 
to the statement “F, and hence the integral (42), remains the same if we 
replace x by the new variable 

x* = x + e, (43) 


A similar remark holds for the function T in (41). 
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where s is an arbitrary constant.” It follows that H is a first integral of 
the system of Euler equations corresponding to the functional (42) if and 
only if (42) is invariant under the transformation (43). 10 

We now show that even in the general case, there is a connection between 
the existence of certain first integrals of a system of Euler equations and the 
invariance of the corresponding functional under certain transformations 
of the variables x,y u ..., y n . We begin by defining more precisely what 
is meant by the invariance of a functional under some set of transformations. 
Suppose we are given a functional 11 

J[yi, = f 1 F(x,yi, ...,y' n )dx, 

Jx 0 

which we write in the concise form 

J[y] = F{x,y,y')dx, (44) 

•'Xo 

where now y indicates the ^-dimensional vector (y l9 ..., y n ) and y f the 
^-dimensional vector ( jy,..., y' n ). Consider the transformation 

x* = <D(x, Ji. y n , y’l,..., y’ n ) = 0(x, y, /), 

y* ='F t (x,yi, ...,y n ,y'i, • = x F { (x, y, y'), K ’ 

where i = 1, .. n. The transformation (45) carries the curve y, with the 
vector equation 

y = y(x) (x 0 4 x 4 *!>, 

into another curve y*. In fact, replacing y, y' in (45) by y(x), y\x), and 
eliminating x from the resulting n + 1 equations, we obtain the vector 
equation 

y* = y*( X *) (x* < X* < Xf) 

for y*, where y* = (j? ,..., y*). 

Definition. The functional (44) is said to be invariant under the 
transformation (45) if J[ y*] = /[y], i.e., if 


54 * 


dx* = 



10 The fact that H is a first integral only if (42) is invariant under the transformation 
(43) follows from the formula 

dH _ 8H 
dx dx 

(see footnote 4, p. 70), since BH/dx = 0 only if BF/Bx = 0. 

11 To avoid confusion in what follows, the reader should note that the subscripts can 
play two different roles; when indexing x , they refer to different values , while when indexing 
y, they refer to different functions. For example, the yf are new functions, while x$ and 
xf are the new positions of the end points of the interval [jt 0 , *i]. 
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Example 1. The functional 

J[y] = f 1 y 2 dx 

*>x 0 

is invariant under the transformation 

x* = x + s, y* = y, (46) 

where e is an arbitrary constant. In fact, given a curve y with equation 
y = j(x) (x 0 < x ^ *0, 

the “transformed” curve y*, i.e., the curve obtained from y by shifting it a 
distance e along the x-axis, has the equation 

y* = y(x* - e) = >>*(**) (x 0 + £ ^ X* ^ X ± + s), 

and then 



Example 2. The integral 

J[y] = f 1 xy' 2 dx 

Jx 0 

is an example of a functional which is not invariant under the transformation 
(46). In fact, carrying out the same calculations as in Example 1, we obtain 


y[y*] 



Suppose now that we have a family of transformations 

x* = <D (x,y,y‘; e), 
y* = s), 


(47) 


depending on a parameter e, where the functions $ and T* (/ = 1,..., ri) 
are differentiable with respect to e, and the value s = 0 corresponds to the 
identity transformation: 


y, /; 0) = x, 

0) = y t . 


(48) 


Then we have the following result: 


Theorem {Noether). If the functional 

J[y]= ( 1 F(x,y,y')dx 

Jx 0 


( 49 ) 
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is invariant under the family of transformations (47) for arbitrary x Q and 
x l9 then 


2 F vi + ( F ~ 2 y ' iF y) v = const 


(50) 


along each extremal of J[y ], where 

d®( x, y, /; z) 
dz 

dYjjx, y,y'; z) 
dz 


<p(*> y> y) = 
y, /) = 


(51) 


In other words , every one-parameter family of transformations leaving 
J[y] invariant leads to a first integral of its system of Euler equations . 

Proof Suppose s is a small quantity. Then, by Taylor’s theorem, 
we have 12 


X* = e) = 0(x, y,y’; 0) + e J ^ 

y? = ¥.(*, y, /■,£)= ¥,(*, y, y ; 0) + e £ ) 


+ °( £ )» 

|E = 0 

+ o(e), 

e = 0 


or using (48) and (51), 

** = X + S?(*, y, /) + o(c), , 

jf = Ji + 4((*, y, /) + o(e). 

Assuming that the curve 

Jt = Ji(x) (1 < i < «) 

is an extremal of J[y], we can use formula (11) of Sec. 13 to write an 
expression for the variation of J [y] corresponding to the transformation 
(52). Since in the present case 13 

Sx = zcp , Sy t = ety i9 

the result is 

87 = e [ 2 Fy^ + (f - 2 y'iPy!) tI*’* 1 - 

Lt = i \ i = i / Ji = i 0 


12 As usual, v) = 0 (e) means that rj/e -> 0 as £ -> 0. 

13 Here 8x, 8yt mean the principal linear parts (relative to s) of the increments Ax, Ay» 
of x, >>*, and not simply Ax, A y t as in Sec. 13. It is easy to see that this change in inter¬ 
pretation has no effect on the final result, and has the advantage of making it unnecessary 
to bother with infinitesimals of higher order. 
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Since by hypothesis, J[y] is invariant under (52), 8 J vanishes, i.e., 


[ 2 F vM + ( F ~ t y’fy s) ?1 

Li = l \ i = l / j X = x 0 

= f i ^ + (f - i y'iFy) ?1 

Lf=i \ ! Jx = xi 

The fact that (50) holds along each extremal now follows from the 
arbitrariness of x 0 and x ± . 

Remark. In terms of the canonical variables p { and H , equation (50) 
becomes simply 

n 

2 /vh — ^9 = const. (53) 

i= 1 

Example 3. Consider the functional 

J\y]= r F(y,/)dx, (54) 

Jxo 


whose integrand does not depend on x explicitly. Then, by exactly the 
same argument as given in Example 1, J[y ] is invariant under the one- 
parameter family of transformations 


In this case, 

and (53) reduces to just 


■* = x + e, yf = y { . 

9 = 1 , = 0 , 

H = const, 


(55) 


i.e., the Hamiltonian H is constant along each extremal of J[y]. Thus, we 
again obtain a result already proved in Sec. 17: For a functional of the 
form (54), which does not depend on x explicitly , the Hamiltonian is a first 
integral of the system of Euler equations. 


21. The Principle of Least Action 

We now apply the general results obtained in the preceding sections to 
some mechanical problems. Suppose we are given a system of n particles 
(mass points), where no constraints whatsoever are imposed on the system. 
Let the ith particle have mass m t and coordinates x i9 y i9 z { (/ = 1,..., n). 
Then the kinetic energy of the system is 14 

t = \ 2 + y? + ( 56 ) 

L f=i 


14 Here t denotes the time, and the overdot denotes differentiation with respect to t. 
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We assume that the system has potential energy U, i.e., that there exists a 
function 


u = U(t, Xu Ji, Zi . X n , y n , z n ) 


(57) 


such that the force acting on the ith particle has components 


Xi =- 


dU 

dxl 



Zi =- 


dU 

dzi' 


Next, we introduce the expression 

L = T- U, (58) 

called the Lagrangian (function ) of the system of particles. Obviously, L is 
a function of the time t and of the positions (x i9 y i9 z t ) and velocities (x i9 y i9 z f ) 
of the n particles in the system. 

Suppose that at time t 0 the system is in some fixed position. Then the 
subsequent evolution of the system in time is described by a curve 


Xi = *i(0> yt = yit\ = z { (t) (i = 


in a space of 3 n dimensions. It can be shown that among all curves passing 
through the point corresponding to the initial position of the system, the 
curve which actually describes the motion of the given system, under the 
influence of the forces acting upon it, satisfies the following condition, 
known as the principle of least action : 


Theorem. The motion of a system of n particles during the time 
interval |7 0 , ^i] is described by those functions x ft), yft ), zft), 1 ^ i ^ n, 
for which the integral 

\ h Ldt , (59) 

Jto 

called the action , is a minimum. 


Proof We show that the principle of least action implies the usual 
equations of motion for a system of n particles. If the functional (59) 
has a minimum, then the Euler equations 


dL _ d_dL 
dx t dt dx t 
dL _ d_d_L 
8 y t dt dy t 
8 L _ d_8L 

dZj dt 8ii 


(60) 


must be satisfied for / = 1,..., n. Bearing in mind that the potential 
energy U depends only on t, x { , y t , z u and not on x iy y u z t , while T is a 
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sum of squares of the velocity components x i9 y i9 z { (with coefficients 
inii), we can write the equations (60) in the form 


_eu 

dx t 

_dU 

dy { 

_d_U_ 

dz t 

Finally, since the derivatives 
_d_JJ_ 
dxl 


d . 

- Jf m iXl 

= o. 

d . 
- J( m iyi 

= 0, 

d . 

~ It W ‘ Z ‘ 

= 0. 

dU 

8 U 

By- 

8 zi 


(61) 


are the components of the force acting on the ith particle, the system 
(61) reduces to 

rriiXi = X i9 
myi = Y u 
YYliZi = Z i9 


which are just Newton’s equations of motion for a system of n particles, 
subject to no constraints. 

Remark 1. The principle of least action remains valid in the case where the 
system of particles is subject to constraints, except that then the admissible 
curves, for which the functional (59) is considered, have to satisfy the con¬ 
straints. In other words, in this case, application of the principle of least 
action leads to a variational problem with subsidiary conditions. 


Remark 2. Actually, as we shall see later (Sec. 36.2), the principle of 
least action only holds for sufficiently small time intervals |7 0 , ti] 9 and has 
to be modified for continuous mechanical systems. 


22. Conservation Laws 

We have just seen that the equations of motion of a mechanical system 
consisting of n particles, with kinetic energy (56), potential energy (57) and 
Lagrangian (58), can be obtained from the principle of least action, i.e., by 
minimizing the integral 

C 1 Ldt = f 1 (T - U) dt. (62) 

jfo Jt 0 

The canonical variables corresponding to the functional (62) turn out to be 

dL 

Pix = _ = m*. 

dL 

Piy = d$i = miyu 

dL 

P» = dl t = m ‘ Zi ’ 
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which are just the components of the momentum of the ith particle. 15 In 
terms of p ix , p iy and p iz , we have 


H = 2 + ytPiv + ZiPtz) - L = 2T-(T-U)=T+U, 


so that H is the total energy of the system. 

Using the form of the integrand in (62), we can find various functions which 
maintain constant values along each trajectory of the system, thereby 
obtaining so-called conservation laws. 

1. Conservation of energy. Suppose the given system is conservative , 
which means that the Lagrangian L (or more precisely, the potential 
energy U) does not depend on time explicitly. Then, as shown in 
Sec. 17 (see also Sec. 20, Example 3), H = const along each extremal, 
i.e., the total energy of a conservative system does not change during 
the motion of the system. 


2. Conservation of momentum. First, we recall that according to Noether’s 
theorem (Sec. 20), invariance of the functional (49) under the family of 
transformations 

x* = <D(x, y, y '; e) = x, 

y * = 'Fi (x,y,y'; e) 


implies that the corresponding system of Euler equations has the first 
integral 


where 


2 = const ’ 


y, /) = 


y, /; s)l 


dz 


since in this case, 


<?(*, y, /) = 


&b(x, y, y'\ e) 


8 s 


= 0 . 


Therefore, the invariance of the functional (62) under the transformation 
xf = Xi + e, y? = y t , z* = z, 

implies that 

V 8L 

> Q-r = const, 
it! dx t 


i.e.. 


2 Pix = const. 


15 By analogy with mechanical problems, the variables p x = F y \ are often called the 
momenta , regardless of the interpretation of the integrand ^appearing in the functional (1). 
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Similarly, it follows from the invariance of (62) under displacements 
along the y- axis that 

n 

2 Piy = const, 

i= 1 

and from the invariance of (62) under displacements along the z-axis that 

n 

2 Ptz = const. 

t = l 

The vector P with components 

n n n 

Px = 2 Py ~ 2 ^ z = 2 P iz 

f=l i = 1 i= 1 

is called the total momentum of the system. Thus, we have just proved 
that the total momentum is conserved during the motion of the system 
if the integral (62) is invariant under parallel displacements. [It is clear 
from these considerations that the invariance of (62) under displace¬ 
ments along any coordinate axis, e.g., along the x-axis, implies that 
the corresponding component of the total momentum is conserved.] 

3. Conservation of angular momentum. Suppose the integral (62) is 
invariant under rotations about the z-axis, i.e., under coordinate 
transformations of the form 

x* = Xi cos e + y t sin e, 
yf = — Xi sin £ + y t cos s, 
zf = z t . 


In this case, 


+1* = 
•= 
<\>iz = 


dxf 

dz e = o 

dyf 

&£ e = o 
dzf 

6 = 0 


= yt, 

= -x i9 

= o, 


and hence Noether’s theorem implies that 


dL dL \ 

rr y t - 77 x t = const, 
[dx/ 1 dy t 7 


2 (PtxJ'i “ Piv x i ) = const * 


Each term in this sum represents the z-component of the vector product 
Pi x Ti, where Ti = (x t , y h z t ) is the position vector and p t = (p ix ,Pi y , p iz ) 
the momentum of the /th particle. The vector p f x ri is called the 
angular momentum of the /th particle, about the origin of coordinates, 



88 CANONICAL FORM OF THE EULER EQUATIONS 


CHAP. 4 


and (63) means that the sum of the z-components of the angular 
momenta of the separate particles, i.e., the z-component of the total 
angular momentum (of the whole system) is a constant. Similar asser¬ 
tions hold for the x and y-components of the total angular momentum, 
provided that the integral (62) is invariant under rotations about the x 
and y-axes. Thus, we have proved that the total angular momentum 
does not change during the motion of the system if (62) is invariant 
under all rotations. 

Example 1. Consider the motion of a particle which is attracted to a 
fixed point, according to some law. In this case, energy is conserved, since 
L is time-invariant, and angular momentum is also conserved, since L is 
invariant under rotations. However, momentum is not conserved during 
the motion of the particle. 

Example 2. A particle is attracted to a homogeneous linear mass distri¬ 
bution lying along the z-axis. In this case, the following quantities are 
conserved: 

1. The energy (since L is independent of time); 

2. The z-component of the momentum; 

3. The z-component of the angular momentum. 


23. The Hamilton-Jacobi Equation. Jacobi’s Theorem 16 

Consider the functional 

J[y\= f 1 F(x, Ji,..., y n , y'l,..., y'n ) dx (64) 

dx 0 

defined on the curves lying in some region R , and suppose that one and only 
one extremal of (64) goes through two arbitrary points A and B. The 
integral 


S= ( 1 F(x,y 1 ,...,y n ,y' 1 ,...,y' n )dx (65) 

J X0 

evaluated along the extremal joining the points 

A = (x 0 , j?,..., jS), B = (*!, y\,..., yi) (66) 

is called the geodetic distance between A and B. The quantity S is obviously 
a single-valued function of the coordinates of the points A and B. 


16 In this section, we drop the vector notation introduced in Sec. 20, and revert to the 
more explicit notation used earlier. The vector notation will be used again later 
(e.g., in Sec. 29). 
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Example 1. If the functional J is arc length, S is the distance (in the usual 
sense) between the points A and B. 

Example 2. Consider the propagation of light in an inhomogeneous and 
anisotropic medium, where it is assumed that the velocity of light at any 
point depends both on the coordinates of the point and on the direction of 
propagation, i.e., 

v = v(x, y, z, x, y, z). 

The time it takes light to go from one point to another along some curve 
* = x(t), y = y(t), z = z(t ) 
is given by the integral 

rh 4- v 2 4- z 2 

T = y dt. (67) 

Jto v 

According to Fermat’s principle, light propagates in any medium along the 
curve for which the transit time T is smallest, i.e., along the extremal of the 
functional (67). Thus, for the functional (67), S is the time it takes light 
to go from the point A to the point B. 

Example 3. Consider a mechanical system with Lagrangian L. According 
to Sec. 21, the integral 

L(t , x l9 y l9 z l9 . . .,x n9 y n9 z n ) dt 

Jto 

evaluated along the extremal passing through two given points, i.e., two 
configurations of the system, is the “least action” corresponding to the 
motion of the system from the first configuration to the second. 

If the initial point A is regarded as fixed and the final point B = (x, y l9 ..., y n ) 
is regarded as variable, 17 then in the region R , 

S = S(x 9 y l9 ..., y n ) (68) 

is a single-valued function of the coordinates of the point B. We now 
derive a differential equation satisfied by the function (68). We first 
calculate the partial derivatives 


dS 3S 
dx dy { 


(/ = !,...,«), 


by writing down the total differential of the function S, i.e., the principal 
linear part of the increment 

AS = S(x + dx, y 1 + dy l9 ... 9 y n + dy n ) - S(x, y l9 ... 9 y n ). 

Since, by definition, AS is the difference 


J[ Y*] - JM, 


17 Since B is now variable, we drop the superscript in the second of the formulas (66). 
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where y is the extremal going from A to the point (x 9 y l9 ..., y n ) and y* is 
the extremal going from A to the point (x + dx 9 y x + dy l9 .. . 9 y n + dy n ), 
we have 

ds = sy, 

where the “ unvaried ” curve is the extremal y and the initial point A is held 
fixed. (The fact that the “varied” curve y* is also an extremal is not 
important here.) 

Thus, using formula (12) of Sec. 13 for the general variation of a functional, 
we obtain 


dS{x, y u ..., y n ) = 8 J = p t dy t - H dx, 

i = 1 

where (69) is evaluated at the point B. It follows that 

3S TT dS 

tt = -H, — = p i9 

dx dy { 

where 18 

Pi = Pi(x, yu---, y n ) = F y \x, yu---, y n , y'i(x),y' n (x)] 


(69) 


(70) 

(71) 


and 


H = H[x, y l9 ... , y n9 p^x, y l9 ... 9 y n ) 9 ..., p n (x 9 y l9 ... 9 y n )] 

are functions of x, y l9 .. ., y n . Then from (70) we find that S , as a function 
of the coordinates of the point B , satisfies the equation 

dS ^u( dS dS \-n 

dx + H r’ Jl ’ • • • ’ y dy x > ■ • • ’ 8yJ °‘ ^ 72) 

The partial differential equation (72), which is in general nonlinear, is called 
the Hamilton-Jacobi equation. There is an intimate connection between the 
Hamilton-Jacobi equation and the canonical Euler equations. In fact, the 
canonical equations represent the so-called characteristic system associated 
with equation (72). 19 We shall approach this matter from a somewhat 
different point of view, by establishing a connection between solutions of the 
Hamilton-Jacobi equation and first integrals of the system of Euler equations: 

Theorem 1. Let 


S S(x 9 y ^ 9 ..., y n9 ocj,. •. > &m) 


(73) 


18 In (71), y[(x) denotes the derivative dyjdx calculated at the point B for the 
extremal y going from A to B. 

19 See e.g., R. Courant and D. Hilbert, Methods of Mathematical Physics , Vol. //, 
Interscience, Inc., New York (1962), Chap. 2, Sec. 8. 
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be a solution , depending on m (^w) parameters a l5 . ..,a m 0 / f/ie 
Hamilton-Jacobi equation (72). Then each derivative 

dS 

at (, = 1 ’-’ w) 


w 0 integral of the system of canonical Euler equations 


(fyi _ dff 
dx dp i ’ 


i.e., 


3S 

■ 5 — = const 


along each extremal. 

Proof We have to show that 


d_ 

dx 



= 0 


dx dy t 


(/ = 1 ,.. m) 


(i = 1,.. m) 


(74) 


along each extremal. Calculating the left-hand side of (74), we find 
that 


d_ 13S\ = 3 2 S ^ d 2 S dy 5 

dx \doLi) dx doL { dy k dcL { dx 


(75) 


Substituting (73) into the Hamilton-Jacobi equation (72), and 
differentiating the result with respect to a t , we obtain 


d 2 S 
dx doii 


-2 


8 H 8 2 S 


1 &Pk fyk 


(76) 


Then substitution of (76) into (75) gives 

d_ /05\ _ _ y &K 8 * s y 8 *S <tyk 
dx \8ctJ ~ 8p k 8y k 8a t + 8y k 8ct t dx 

= ^ 8 2 S /dn _ 8H\ 

k 4‘ l 3ytc 8rj. t \ dx 8pJ 

Since 


dy*_ 8 Jl = o 

dx 8p k 


(k = \,...,n) 


along each extremal, it follows that (74) holds along each extremal, 
which proves the theorem. 

Theorem 2 {Jacobi). Let 


S = S{x,y u . . .,y n , a x .a n ) (77) 

be a complete integral of the Hamilton-Jacobi equation (72), i.e., a general 
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solution of (72) depending on n parameters a t ,..., a n . Moreover , let the 
determinant of the n x n matrix 


d 2 S 

dy k 

be nonzero , fef Pi, ...,p n be n arbitrary constants, 

functions 

yi *1> • • • > Pl> • • •» Pn) (J = 1, . . ., /l) 

defined by the relations 


(78) 
Then the 

(79) 




*i, • • •> a n ) = p, (/=!,.. 


(80) 


together with the functions 


Pi = 


^1> • • • > ^n) (J !>•••> d), 


(81) 


where the y t are given by (79), constitute a general solution of the canonical 
system 


dy t _ dH dpi 

dx dpi dx 


dH 

dy t 


(i = l,...,/i). 


(82) 


Proof 1. According to Theorem 1, the n relations (80) correspond 
to first integrals of the canonical system (82). To obtain the general 
solution of (82), we first use (80) to define the n functions (79) [this is 
possible since (78) has a nonvanishing determinant], and then use (81) 
to define the n functions p { . To show that the functions y { and p { so 
defined actually satisfy the canonical equations (82), we argue as follows: 
Differentiating (80) with respect to x, where the y { are regarded as 
functions of x [cf. (79)], we obtain 


d_ /(95\ _ d 2 S ^ d 2 S dy* _ y d 2 S Idy* _ dH\ 
dx\dcLil dx doL t dy k dai i dx ^ dy k dcn i \dx dp k j 

where in the last step we have used (76). Since the determinant of the 
matrix (78) is nonzero, it follows that 

7i~W, (i -‘- "">• < 83 > 

which is just the first set of equations (82). 

Next, we differentiate (81) with respect to jc, obtaining 

dpt _ d_ /<9S\ _ d 2 S y d 2 S dy^ _ d 2 S ^ d 2 S dH 
dx ~ dx \8yJ ~ dx 8y t + fc 4i 8y k 8y { dx ~ dx 8y t + k 4i 8y k 8y { 8p k 
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where we have used (83). Then, taking account of (81) and differ¬ 
entiating the Hamilton-Jacobi equation (72) with respect to y i9 we 
find that 


d 2 S 
dx dy t 


dH_ A 3H d 2 S 
d'l d Pk fyk tyl 


A comparison of the last two equations shows that 


dpi 

dx 


dH 

fyi 


O' = !,...,«), 


which is just the second set of equations (82). 

Proof 2. Our second proof of Jacobi’s theorem is based on the use 
of a canonical transformation. Let (77) be a complete integral of the 
Hamilton-Jacobi equation. We make a canonical transformation of 
the system (82), choosing the function (77) as the generating function, 
<x l9 ..., a n as the new momenta (cf. footnote 15, p. 86), and (3^ ..., (3 n 
as the new coordinates. Then, according to formula (41) of Sec. 19, 


dS 

Pi = a - ’ 
8y t 


R - 8S 


H* = H + ~ 

OX 


But since the function S satisfies the Hamilton-Jacobi equation, we 
have 

H * = H + = 0. 

dx 

Therefore, in the new variables, the canonical equations become 


d<x t 

dx 


0 , 



from which it follows that a* = const, (3* = const along each extremal. 
Thus, we again obtain the same n first integrals 


of the system of Euler equations. If we now use these equations to de¬ 
termine the functions (79) of the In parameters a x ,..., a n , (3^..., (3 n , 
and if, as before, we set 

Pi ~fiy. Tl> * * ' 9 y n ’ * • * > ^n)> 

where the y t are given by (79), we obtain 2 n functions 

y^Xi &i> . . . j 0 C n , (3 l5 ... 5 (3 n ), 

Pi(x, OCi, . . • , 0C n , (3j_, ..., (3 n ), 

which constitute a general solution of the canonical system (82). 
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PROBLEMS 

1. Use the canonical Euler equations to find the extremals of the functional 

/ V * 2 + y 2 VTTV 2 dx, 

and verify that they agree with those found in Chap. 1, Prob. 22. 

Hint. The Hamiltonian is 

H(x, y, p) = - Vx 2 + y 2 - p 2 , 
and the corresponding canonical system 

dp = y dy = p 

dx V x 2 + y 2 — p 2 dx V * 2 + y 2 — p 2 

has the first integral 

P 2 - y 2 = C 2 , 

where C is a constant. 

2. Consider the action functional 

J[x ] = ^ Jj 1 (mx 2 - xx 2 ) dt 

corresponding to a simple harmonic oscillator , i.e., a particle of mass m 
acted upon by a restoring force — xx (cf. Sec. 36.2). Write the canonical 
system of Euler equations corresponding to J[x] 9 and interpret them. Calcu¬ 
late the Poisson brackets [x, p], [x, H] and [p 9 H], Is p a first integral of 
the canonical Euler equations ? 

3. Use the principle of least action to give a variational formulation of the 
problem of the plane motion of a particle of mass m attracted to the origin 
of coordinates by a force inversely proportional to the square of its distance 
from the origin. Write the corresponding equations of motion, the Hamil¬ 
tonian and the canonical system of Euler equations. Calculate the Poisson 
brackets [r, p r ], [0, p 0 ], [ p r , H] and [p Q , H], where 

dL dL 

Pr ~ 8r’ Pe ~ae 

Is p Q a first integral of the canonical Euler equations? 

Hint. The action functional is 

J[r, 6 ] = £ [j (r 2 + r 2 0 z ) + dt, 

where k is a constant, and r, 0 are the polar coordinates of the particle. 

4. Verify that the change of variables 

Yi = p u Pi = y { 

is a canonical transformation, and find the corresponding generating function. 
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5. Verify that the functional J[r , 0] of Prob. 3 is invariant under rotations, 
and use Noether’s theorem (in polar coordinates) to find the corresponding 
conservation law. What geometric fact does this law express? 

Arts. The line segment joining the particle to the origin sweeps out equal 
areas in equal times. 

6. Write and solve the Hamilton-Jacobi equation corresponding to the 
functional 

j[y] = \ Xl y 2 dx 9 

Jx 0 

and use the result to determine the extremals of J[y]. 

Am. The Hamilton-Jacobi equation is 



7. Write and solve the Hamilton-Jacobi equation corresponding to the 
functional 

J[y] = /(yWTTY 2 dx, 

Jx o 

and use the result to find the extremals of J[y]. 

Am. The Hamilton-Jacobi equation is 


with solution 



= f 2 (y\ 


S = olx + f V/ 2 (r}) — a 2 dr\ + p. 

Jy o 

The extremals are 

f y d'f\ 

x - a .... . . .. - - = const. 

Jyo VfXr,) - a 2 

8. Use the Hamilton-Jacobi equation to find the extremals of the functional 
of Prob. 1. 


Hint. Try a solution of the form S = }(Ax 2 + 2 Bxy + Cy 2 ). 


9. What functional leads to the Hamilton-Jacobi equation 



= 1 ? 


10. Prove that the Hamilton-Jacobi equation can be solved by quadratures 
if it can be written in the form 


11. By a Liouville surface is meant a surface on which the arc-length 
functional has the form 


J[y] = (" 1 ^<PiW + 92(j')v / l + y' 2 dx. 

JX o 
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Prove that the equations of the geodesics on such a surface are 

f dx - f dy = P> 

J ^<?i(x) - a J V(p 2 (y) + a 

where a and p are constants. Show that surfaces of revolution are Liouville 
surfaces. 



5 


THE SECOND VARIATION. 
SUFFICIENT CONDITIONS 
FOR A WEAK EXTREMUM 


Until now, in studying extrema of functionals, we have only considered 
a particular necessary condition for a functional to have a weak (relative) 
extremum for a given curve y, i.e., the condition that the variation of the 
functional vanish for the curve y. In this chapter, we shall derive sufficient 
conditions for a functional to have a weak extremum. To find these sufficient 
conditions, we must first introduce a new concept, namely, the second 
variation of a functional. We then study the properties of the second varia¬ 
tion, and at the same time, we derive some new necessary conditions for an 
extremum. 

As will soon be apparent, there exist sufficient conditions for an extremum 
which resemble the necessary conditions and are easy to apply. These 
sufficient conditions differ from the necessary conditions (also derived in 
this chapter) in much the same way as the sufficient conditions y' = 0, 
y" > 0 for a function of one variable to have a minimum differ from the 
corresponding necessary conditions y' = 0, y" ^ 0. 


24. Quadratic Functionals. The Second Variation of a Functional 

We begin by introducing some general concepts that will be needed later. 
A functional B[x , y] depending on two elements x and y , belonging to some 

97 
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normed linear spaced, is said to be bilinear if it is a linear functional of y for 
any fixed x and a linear functional of x for any fixed y (cf. p. 8). Thus, 

B[x + y,z] = B[x , z] + B[y , z], 

B[<xx, y] = a B[x, y], 

and 

B[x, y + z] = B[x, y] + B[x , z], 

£[*, aj] = a£[x, j>] 

for any x,_y,zeJ and any real number a. 

If we set y = x in a bilinear functional, we obtain an expression called 
a quadratic functional. A quadratic functional v4[x] = 2?[x, x] is said to be 
positive definite 1 if /f[x] > 0 for every nonzero element x. 

A bilinear functional defined on a finite-dimensional space is called a 
bilinear form. Every bilinear form B[x , y] can be represented as 

n 

j] = 2 

i. fc = l 

where Ei,..., £ n and y]i, ..., Y) n are the components of the “vectors” x and y 
relative to some basis. 1 2 If we set y = x in this expression, we obtain a 
quadratic form 

n 

A[x] = B[x,y\= 2 btkiiZk- 

i. k= 1 


Example L The expression 

B[x,y] = fx(t)y(t)dt 

J a 

is a bilinear functional defined on the space ^ of all functions which are 
continuous in the interval a ^ t ^ b. The corresponding quadratic func¬ 
tional is 

A[x] = I x 2 (t)dt. 

Example 2. A more general bilinear functional defined on ^ is 

B[x, y] = f a (t)x(OXO dt, 

da 

where a(/) is a fixed function. If a (t) > 0 for all t in [a, b], then the corre¬ 
sponding quadratic functional 

A[x] = f cn(t)x 2 (t) dt 

da 

is positive definite. 


1 Actually, the word “definite” is redundant here, but will be retained for traditional 
reasons. Quadratic functionals A[x] such that A[x] ^ 0 for all x will simply be called 
nonnegative (see p. 103 flf.). 

2 See e.g., G. E. Shilov, op. cit ., p. 114. 
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Example 3. The expression 

A[x] = f + WMt)x’(t) + T (t)x'\t)]dt 

Ja 

is a quadratic functional defined on the space of all functions which are 
continuously differentiable in the interval [a, b ]. 

Example 4 . The integral 

B[x, y] = f f K(s, t)x(s)y(t) ds dt 9 

J a J a 

where K(s , t) is a fixed function of two variables, is a bilinear functional 
defined on Replacing y(t) by x(t), we obtain a quadratic functional. 

We now introduce the concept of the second variation (or second differential) 
of a functional. Let J[y] be a functional defined on some normed linear 
space 8 $. In Chapter 1, we called the functional J[y] differentiable if its 
increment 

M[h] = J[y + h]~ J[y] 
can be written in the form 

A J[h] = cp[/i] + e||A||, 

where <p[h] is a linear functional and e -> 0 as ||/r|| -> 0. The quantity 9 [h] 
is the principal linear part of the increment A J[h], and is called the (first) 
variation [or (first) differential ] of J[y\, denoted by 8J[h]. 

Similarly, we say that the functional J[y] is twice differentiable if its incre¬ 
ment can be written in the form 

A J[h] = 9l [h] + 92 W + z\\h\\ 2 > 

where <pi[h] is a linear functional (in fact, the first variation), 9 2 [h] is a quad¬ 
ratic functional, and s->0 as ||/?|| -> 0. The quadratic functional <p 2 [h] is 
called the second variation (or second differential) of the functional •/[>>], 
and is denoted by SV[/ 7]. 3 From now on, it will be tacitly assumed that we 
are dealing with functionals which are twice differentiable. The second 
variation of such a functional is uniquely defined. This is proved in just 
the same way as the uniqueness of the first variation of a differentiable 
function (see Theorem 1 of Sec. 3.2). 

Theorem 1. A necessary condition for the functional J[y] to have a 
minimum for y = y is that 

8 2 J[y] > 0 ( 1 ) 

for y = y and all admissible h. For a maximum , the sign ^ in (1) is 
replaced by ^. 


3 The comment made in footnote 6, p. 12 applies here as well. 
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Proof \ By definition, we have 

A J[h] = iJ[h] + * 2 J[h] + 4h\\ 2 , (2) 

where e -> 0 as ||/r|| -> Q. According to Theorem 2 of Sec. 3.2, &/[/?] = 0 
for y — y and all admissible h , and hence ( 2 ) becomes 

A J[h] = PJ[h] + s\\h\\ 2 . (3) 

Thus, for sufficiently small ||/r||, the sign of A J[h] will be the same as the 
sign of 8 2 J[h]. Now suppose that $ 2 J[h 0 ] < 0 for some admissible 
h 0 . Then for any a ^ 0, no matter how small, we have 

8 2 /[oA 0 ] = 0L 2 PJ[h 0 ] < 0. 

Hence, (3) can be made negative for arbitrarily small ||/r||. But this is 
impossible, since by hypothesis J[y ] has a minimum for y = y, i.e., 

A J[h] = J[y + h] - J[y] & 0 

for all sufficiently small ||/ 7 ||. This contradiction proves the theorem. 

The condition 8 2 J[h] ^ 0 is necessary but of course not sufficient for the 
functional J[y ] to have a minimum for a given function. To obtain a 
sufficient condition, we introduce the following concept: We say that a 
quadratic functional cp 2 [h ] defined on some normed linear space 3# is strongly 
positive if there exists a constant k > 0 such that 

? 2 [*] ^ *WI 2 

for all /j . 4 

Theorem 2. A sufficient condition for a functional J[y] to have a mini¬ 
mum for y = y, given that the first variation [h] vanishes for y = j>, 
is that its second variation 8 2 J[h] be strongly positive for y = y. 

Proof For y = y, we have &/[/?] = 0 for all admissible /?, and hence 

A J[h] = &J[h] + s\\h\\ 2 , 

where e -> 0 as \\h\\ -> 0. Moreover, for y = y, 

Z 2 J[h] > k\\h\\\ 

where k = const > 0. Thus, for sufficiently small e l9 |c| < \k if 
\\h || < £x. It follows that 

A J[h] = S 2 J[h] + s|! h|| 2 > \k |j h|| 2 > 0 

if ||/?|| < £ 1? i.e., J[y\ has a minimum for y = y\ as asserted. 


4 In a finite-dimensional space, strong positivity of a quadratic form is equivalent to 
positive definiteness of the quadratic form. Therefore, a function of a finite number of 
variables has a minimum at a point P where its first differential vanishes, if its second 
differential is positive at P. In the general case, however, strong positivity is a stronger 
condition than positive definiteness. 
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25. The Formula for the Second Variation. Legendre’s Condition 

Let F(x , y, z) be a function with continuous partial derivatives up to 
order three with respect to all its arguments. (Henceforth, similar smooth¬ 
ness requirements will be assumed to hold whenever needed.) We now 
find an expression for the second variation in the case of the simplest varia¬ 
tional problem, i.e., for functionals of the form 

J[y]= f F(x,y,y') dx, (4) 

defined for curves y = j>(x) with fixed end points 

y(a) = A, y(b) = B. 

First, we give the function y(x) an increment h(x) satisfying the boundary 
conditions 

h(a) = 0, h(b) = 0. (5) 

Then, using Taylor’s theorem with remainder, we write the increment of the 
functional J[y] as 

A7[A] = J[y + h]~ J[y] 

I fb _ _ (b) 

= J a (Fyh + F y M) dx + -ja (Fyyh 2 + 2 F yy M f + F y . y .h' 2 ) dx , 

where, as usual, the overbar indicates that the corresponding derivatives are 
evaluated along certain intermediate curves, i.e., 

Fyy = Fyyfay + 0/f, / + 0A') (0 < G < 1), 

and similarly for F yy > and F y > y >. 

If we replace F yy , F yy > and by the derivatives F yy , F yy > and F y > y > eval¬ 
uated at the point (x, >>(*)> /MX then ( 6 ) becomes 

A J[h\ = C(F y h + Fyh') dx + \ C (Fyyh 2 + 2F yy -hh' + Fyyh' 2 )dx + e, (7) 

Ja Z J a 

where e can be written as 

( (ei /? 2 -f z 2 hh' + z 3 h' 2 ) dx. ( 8 ) 

J a 

Because of the continuity of the derivatives F yy9 F yy > and F vv , it follows 
that e 1? e 2 > £3 -> 0 as ||/?|| x -> 0 , from which it is apparent that s is an infinites¬ 
imal of order higher than 2 relative to \\h\\ 2 . The first term in the right- 
hand side of (7) is 8.7 [/z], and the second term, which is quadratic in /?, is 
the second variation 8 2 J[h]. Thus, for the functional (4) we have 

8V[/|] = \ £ (F yv h 2 + 2Fyy hh' + F y . y .h' 2 ) dx. 


(9) 
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We now transform (9) into a more convenient form. Integrating by parts 
and taking account of (5), we obtain 

1 2F ** hh ' dx= ~ \ a [dx Fyy ) h2 dx • 

Therefore, (9) can be written as 

S 2 y[/i] = f (Ph' 2 + Qh 2 ) dx, (10) 

where 

P = P{x) = 1 F,y. t Q = Q(x) = l J[F yy . - ± F yy )- (11) 

This is the expression for the second variation which will be used below. 

The following consequence of formulas (7) and ( 8 ) should be noted. If 
J[y ] has an extremum for the curve y = y(x ), and if y = >>(*) + h(x) is an 
admissible curve, then 

A J[h] = f (Ph' 2 + Qh 2 ) dx + f (I \h 2 + y]/i' 2 ) dx , (12) 

Ja da 

where 5, y) -> 0 as ||A||i ->0. In fact, since J [y] has an extremum for >> = y(x ), 
the linear terms in the right-hand side of (7) vanish, while the quantity ( 8 ) 
can be written in the form 


f ( lh 2 + Y]/l 2 ) dx 
da 

by integrating the term z 2 hh ' by parts and using the boundary conditions (5). 
Formula (12) will be used later, when we derive sufficient conditions for a 
weak extremum (see Sec. 27). 

It was proved in Sec. 24 that a necessary condition for a functional J[y ] to 
have a minimum is that its second variation $ 2 J[h] be nonnegative. In the 
case of a functional of the form (4), we can use formula (10) to establish a 
necessary condition for the second variation to be nonnegative. The argu¬ 
ment goes as follows: Consider the quadratic functional (10) for functions 
h(x) satisfying the condition h{a) = 0. With this condition, the function 
h(x) will be small in the interval [a, b] if its derivative h\x) is small in [a , b]. 
However, the converse is not true, i.e., we can construct a function h(x) 
which is itself small but has a large derivative h\x) in [a, b]. This implies 
that the term Ph' 2 plays the dominant role in the quadratic functional (10), 
in the sense that Ph' 2 can be much larger than the second term Qh 2 but it 
cannot be much smaller than Qh 2 (it is assumed that P ^ 0). Therefore, 
it might be expected that the coefficient P(x) determines whether the func¬ 
tional (10) takes values with just one sign or values with both signs. We now 
make this qualitative argument precise: 
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Lemma. A necessary condition for the quadratic functional 

8 2 J[h\ = l'\ph' 2 + Qh 2 )dx, (13) 

Ja 

defined for all functions h{x) e Qjfa, b) such that h{a) = h{b) = 0, to be 
nonnegative is that 

P{x) > 0 (a ^ x ^ b). (14) 

Proof Suppose (14) does not hold, i.e., suppose (without loss of 
generality) that P{x 0 ) = — 2(3 ((3 > 0) at some point x 0 in [a, b]. Then, 
since P(x) is continuous, there exists an a > 0 such that a ^ x 0 — a, 
x 0 + a ^ b, and 

P(x 0 ) < — P (x 0 — a ^ ^ x 0 + a). 


We now construct a function h(x) e 3) fa, b) such that the functional (13) 
is negative. In fact, let 


•2 *(x- x 0 ) 


h(x) = 


sin 


for x 0 — a ^ x ^ x 0 + a, 
otherwise. 


Then we have 

rb 


r° r x o + ot 7 i 

(Ph’ 2 + Qh 2 ) dx = P^siri 

da dx 0 -cc 0C 


10+ “ -„ 2 2 *(* - *o) 


dx 


+ 


f :+a e 

•'xo 


(15) 


(16) 


. . n (x — x 0 ) , 2fin 2 

sm 4 —-— dx <---h 2Moc, 


where 


M = max |0(x)|. 

a^x ^.b 


For sufficiently small a, the right-hand side of (16) becomes negative, 
and hence (13) is negative for the corresponding function h{x) defined 
by (15). This proves the lemma. 

Using the lemma and the necessary condition for a minimum proved in 
Sec. 24, we immediately obtain 

Theorem {Legendre). A necessary condition for the functional 
J[y]=\ F{x,y,y')dx, y(a) = A, y(b) = B 

to have a minimum for the curve y = j>(*) is that the inequality 

Fyy ^ 0 

{Legendre's condition) be satisfied at every point of the curve. 

Legendre attempted (unsuccessfully) to show that a sufficient condition 
for J[y] to have a (weak) minimum for the curve y = y{x) is that the strict 
inequality 

Fyy > 0 


(17) 
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(the strengthened Legendre condition) be satisfied at every point of the curve. 
His approach was to first write the second variation (10) in the form 

PJ[h] = f [Ph' 2 + 2whh' + (Q + w')h 2 ] dx, (18) 

J a 

where w(x) is an arbitrary differentiable function, using the fact that 

0 = f 4- (t vh 2 ) dx = f ( w’h 2 + 2whh’) dx, (19) 

J a dX Ja 

since h(a) = h(b) = 0. Next, he observed that the condition (17) would 
indeed be sufficient if it were possible to find a function h(x) for which the 
integrand in (18) is a perfect square. However, this is not always possible, 
as was first shown by Legendre himself, since then w(x) would have to 
satisfy the equation 

P(Q + w') = w 2 , ( 20 ) 

and although this equation is “locally solvable,” it may not have a solution 
in a sufficiently large interval . 5 

Actually, the following argument shows that the requirement that 

Fv’v\x,y(x),y'(x)] >0 (21) 

be satisfied at every point of an extremal y = >>(x) cannot be a sufficient 
condition for the extremal to be a minimum of the functional J[y]. The 
condition ( 21 ), like the condition 



characterizing the extremal is of a “local” character, i.e., it does not pertain 
to the curve as a whole, but only to individual points of the curve. Therefore, 
if the condition (21) holds for any two curves AB and BC , it also holds for 
the curve AC formed by joining AB and BC. On the other hand, the fact 
that a functional has an extremum for each part AB and BC of some curve 
AC does not imply that it has an extremum for the whole curve AC. For 
example, a great circle arc on a given sphere is the shortest curve joining 
its end points if the arc consists of less than half a circle, but it is not the 
shortest curve (even in the class of neighboring curves) if the arc consists of 
more than half a circle. However, every great circle arc on a given sphere 
is an extremal of the functional which represents arc length on the sphere, 
and in fact it is easily verified that for this functional, ( 21 ) holds at every 
point of the great circle arc. Therefore, (21) cannot be a sufficient condition 


5 For example, if P = — 1 , Q = 1, we obtain the equation w' + 1 + w 2 = 0, so that 
w(x) = tan (c — x). If b — a > tt, there is no solution in the whole interval [a, b], 
since then tan (c — x) must become infinite somewhere in [ a , b]. 
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for an extremum, nor, for that matter, can any set of purely local conditions 
be sufficient. 

Although the condition (20) does not guarantee a minimum, the idea of 
completing the square of the integrand in formula (18) for the second varia¬ 
tion, with the aim of finding sufficient conditions for an extremum, turns 
out to be very fruitful. In fact, the differential equation ( 20 ), which comes 
to the fore when trying to implement this idea, leads to new necessary 
conditions for an extremum (which are no longer local!). We shall discuss 
these matters further in the next two sections. 


26. Analysis of the Quadratic Functional f ( Ph ' 2 + Qh 2 ) dx 

J a 

As shown in the preceding section, to pursue our study of the “simplest” 
variational problem, i.e., that of finding the extrema of the functional 

J[y] = f F{x,y,y')dx, (22) 

da 

where 

y(a) = A, y(b) = B, 

we have to analyze the quadratic functional 6 

f (Ph' 2 + Qh 2 ) dx, (23) 

da 

defined on the set of functions h(x ) satisfying the conditions 

h(a) = 0, h(b) = 0. (24) 

Here, the functions P and Q are related to the function F appearing in the 
integrand of ( 22 ) by the formulas 

P = \ F*v, 2 = 5 ^ F yy'} ( 25 ) 

For the time being, we ignore the fact that (23) is a second variation, satisfying 
the relations (25), and instead, we treat the analysis of (23) as an independent 
problem, in its own right. 

In the last section, we saw that the condition 

P(x) ^0 (a ^ x ^ b) 

is necessary but not sufficient for the quadratic functional (23) to be ^0 
for all admissible h(x). In this section, it will be assumed that the strength¬ 
ened inequality 

P(x) >0 (a ^ x ^ b) 


6 Similarly, the study of extrema of functions of several variables (in particular, the 
derivation of sufficient conditions for an extremum) involves the analysis of a quadratic 
form (the second differential). 
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holds. We then proceed to find conditions which are both necessary and 
sufficient for the functional (23) to be >0 for all admissible h(x ) ^ 0, i.e., 
to be positive definite. We begin by writing the Euler equation 

-±{Ph')+Qh = 0 (26) 

corresponding to the functional (23). 7 This is a linear differential equation 
of the second order, which is satisfied, together with the boundary conditions 
(24), or more generally, the boundary conditions 

h(a) = 0, h(c) = 0, (a < c ^ b ), 

by the function h(x) = 0. However, in general, (26) can have other, non¬ 
trivial solutions satisfying the same boundary conditions. In this connection, 
we introduce the following important concept: 

Definition. The point a (/#) is said to be conjugate to the point a if 
the equation (26) has a solution which vanishes for x = a and x = <3 but 
is not identically zero. 

Remark. If h(x) is a solution of (26) which is not identically zero and 
satisfies the conditions h(a) = h(c) = 0, then Ch(x) is also such a solution, 
where C = const ^ 0. Therefore, for definiteness, we can impose some kind 
of normalization on h(x ), and in fact we shall usually assume that the con¬ 
stant C has been chosen to make h\a) = l. 8 

The following theorem effectively realizes Legendre’s idea, mentioned on 
p. 104. 

Theorem 1. If 

P(x) >0 (a ^ x ^ b), 

and if the interval [< a , b] contains no points conjugate to a , then the quad¬ 
ratic functional 

f (. Ph ' 2 + Qh 2 ) dx (27) 

Ja 

is positive definite for all h(x) such that h(a) = h(b) = 0. 

7 It must not be thought that this is done in order to find the minimum of the functional 
(23). In fact, because of the homogeneity of (23), its minimum is either 0 if the func¬ 
tional is positive definite, or — oo otherwise. In the latter case, it is obvious that the 
minimum cannot be found from the Euler equation. The importance of the Euler 
equation (26) in our analysis of the quadratic functional (23) will become apparent in 
Theorem 1. The reader should also not be confused by our use of the same symbol 
h(x) to denote both admissible functions, in the domain of the functional (23), and 
solutions of equation (26). This notation is convenient, but whereas admissible func¬ 
tions must satisfy h{a) = h{b ) = 0, the condition h{b) = 0 will usually be explicitly 
precluded for nontrivial solutions of (26). 

8 If h(x) ^ 0 and h{a) = 0, then h\a) must be nonzero, because of the uniqueness 
theorem for the linear differential equation (26). See e.g., E. A. Coddington, An 
Introduction to Ordinary Dijferential Equations, Prentice-Hall, Inc., Englewood Cliffs, 
New Jersey (1961), pp. 105, 260. 
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Proof. The fact that the functional (27) is positive definite will be 
proved if we can reduce it to the form 

f P(*)? 2 (- ■■ )dx, 

Ja 

where <p 2 ( • • •) is some expression which cannot be identically zero unless 
h(x ) = 0. To achieve this, we add a quantity of the form d(wh 2 ) to the 
integrand of (27), where w(x) is a differentiable function. This will 
not change the value of the functional (27), since h(a ) = h(b ) = 0 implies 
that 


f d(wh 2 ) dx = 0 

Ja 

[cf. equation (19)]. 

We now select a function w(x) such that the expression 

Ph' 2 + Qh 2 + ■£. ( wh 2 ) = Ph' 2 + 2whh’ + (0 + w')h 2 (28) 

is a perfect square. This will be the case if w(x) is chosen to be a 
solution of the equation 


P(Q + w') = w 2 (29) 

[cf. equation (20)]. In fact, if (29) holds, we can write (28) in the form 



Thus, if (29) has a solution defined on the whole interval [< a , b], the quad¬ 
ratic functional (27) can be transformed into 

£>(/r + j h}* dx, (30) 

and is therefore nonnegative. 

Moreover, if (30) vanishes for some function h(x), then obviously 

h\x) + jh{x) = 0, (31) 

since P(x) > 0 for a ^ x ^ b. Therefore the boundary condition 
h{a) = 0 implies h(x) = 0, because of the uniqueness theorem for 
the first-order differential equation (31). It follows that the functional 
(30) is actually positive definite. 

Thus, the proof of the theorem reduces to showing that the absence of 
points in [< a , b] which are conjugate to a guarantees that (29) has a solution 
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defined on the whole interval [a, b]. Equation (29) is a Riccati 
equation , which can be reduced to a linear differential equation of the 
second order by making a change of variables. In fact, setting 

tv = --P, (32) 

u 

where u is a new unknown function, we obtain the equation 

-±(Pu’)+Qu = 0 (33) 

which is just the Euler equation (26) of the functional (27). If there are 
no points conjugate to a in [a, b ], then (33) has a solution which does not 
vanish anywhere in [a, b], 9 and then there exists a solution of (29), 
given by (32), which is defined on the whole interval [a, b]. This com¬ 
pletes the proof of the theorem. 

Remark. The reduction of the quadratic functional (27) to the form (30) 
is the continuous analog of the reduction of a quadratic form to a sum of 
squares. The absence of points conjugate to a in the interval [a, b] is the 
analog of the familiar criterion for a quadratic form to be positive definite. 
This connection will be discussed further in Sec. 30. 

Next, we show that the absence of points conjugate to a in the interval 
[a, b ] is not only sufficient but also necessary for the functional (27) to be 
positive definite. 

Lemma. If the function h = h(x ) satisfies the equation 
- ±{Ph') +Qh = 0 
and the boundary conditions 

h(a) = h(b) = 0, (34) 

then 

f {Ph' 2 + Qh 2 ) dx = 0. 

da 

Proof The lemma is an immediate consequence of the formula 

° = r [~ ix ^ +Qh \ h dx = r ( ph ' 2 +e/,2) ^ 

which is obtained by integrating by parts and using (34). 

9 If the interval [a, b] contains no points conjugate to a , then, since the solution of the 
differential equation (26) depends continuously on the initial conditions, the interval 
[a, b] contains no points conjugate to a — e, for some sufficiently small e. Therefore, 
the solution which satisfies the initial conditions h(a — e) = 0, h'(a — e) = 1 does not 
vanish anywhere in the interval [a, b]. Implicit in this argument is the assumption that 
P does not vanish in [a, 6]. 
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Theorem 2. If the quadratic functional 

f (Ph' 2 + Qh 2 ) dx, (35) 

J a 

where 

P(x) >0 (a ^ x ^ b), 

is positive definite for all h(x) such that h(a) = h(b) = 0, then the interval 
[a, b] contains no points conjugate to a . 

Proof The idea of the proof is the following: We construct a family 
of positive definite functionals, depending on a parameter t , which for 
t = 1 gives the functional (35) and for t = 0 gives the very simple 
quadratic functional 

r b 

h' 2 dx , 

Ja 

for which there can certainly be no points in [a, b] conjugate to a. 
Then we prove that as the parameter t is varied continuously from 0 
to 1 , no conjugate points can appear in the interval [a, b]. 

Thus, consider the functional 

f [t(Ph' 2 + Qh 2 ) + (1 - t)h' 2 ] dx , (36) 

Ja 

which is obviously positive definite for all /, 0 ^ ^ 1, since (35) 

is positive definite by hypothesis. The Euler equation corresponding to 

(36) is 

~Tx {[,P + {X ~ 0W} + tQh = 0. (37) 

Let h(x , t) be the solution of (37) such that h(a , t) = 0, h x (a, t) = 1 for 
alW, 0 ^ t ^ 1. This solution is a continuous function of the parameter 
t, which for t = 1 reduces to the solution h(x) of equation (26) satisfying 
the boundary conditions h(a) = 0 , h\a) = 1, and for t = 0 reduces to the 
solution of the equation h" = 0 satisfying the same boundary conditions, 
i.e., the function h = x — a. We note that if h(x 0 , / 0 ) = 0 at some 
point (a*o, t 0 ), then h x (x 0 , t 0 ) / 0. In fact, for any fixed t , h(x , t) satisfies 

(37) , and if the equations/ 7 (.v 0 , t 0 ) = 0, h x (x 0 , t 0 ) = 0 were satisfied simul¬ 
taneously, we would have h(x , / 0 ) = 0 for all x, a ^ x ^ b , because of 
the uniqueness theorem for linear differential equations. But this is 
impossible, since h x (a , t) = 1 for all t, 0 ^ t ^ 1 . 

Suppose now that the interval [a , b] contains a point a conjugate 
to a , i.e., suppose that h(x , 1 ) vanishes at some point x = a in [a, b]. 
Then a / b, since otherwise, according to the lemma, 

f (Ph' 2 + Qh 2 ) dx = 0 

-’a 

for a function h(x) ^ 0 satisfying the conditions h(a) = h(b) = 0 , 



I I 0 SUFFICIENT CONDITIONS FOR A WEAK EXTREMUM 


CHAP. 5 


which would contradict the assumption that the functional (35) is positive 
definite. Therefore, the proof of the theorem reduces to showing that 
[a, b] contains no interior point a conjugate to a. 



To prove this, we consider the set of all points (x, t), a ^ x ^ b, 
satisfying the condition h(x , t) = 0. 10 This set, if it is nonempty, 
represents a curve in the x/-plane, since at each point where h(x , t) = 0, 
the derivative h x (x , t) is different from zero, and hence, according to the 
implicit function theorem, the equation h(x , 0 = 0 defines a continuous 
function x = x(t) in the neighborhood of each such point. 11 By 
hypothesis, the point ( a , 1) lies on this curve. Thus, starting from the 
point (a, 1), the curve (see Figure 7) 

A. Cannot terminate inside the rectangle a^x^b, 1, 

since this would contradict the continuous dependence of the 
solution h(x , t) on the parameter t\ 

B. Cannot intersect the segment x = b, 0 ^ t ^ 1, since then, by 
exactly the same argument as in the lemma [but applied to equation 
(37), the boundary conditions h(a , t) = h(b , 0 = 0 and the func¬ 
tional (36)], this would contradict the assumption that the functional 
is positive definite for all t 

C. Cannot intersect the segment a ^ x ^ b, t = 1, since then for 
some t we would have h(x, t) = 0, h x (x, t) = 0 simultaneously; 

D. Cannot intersect the segment a ^ x ^ b, t = 0, since for 
t — 0, equation (37) reduces to h" = 0, whose solution h = x — a 
would only vanish for x = a; 

E. Cannot approach the segment x = a, 0 ^ t ^ 1, since then for 
some t we would have h x (a , t) = 0 [why?], contrary to hypothesis. 


10 Recall that h(a , t) = 0 for all t, 0 ^ ^ 1. 

11 See e.g., D. V. Widder, op. cit. t p. 56. See also footnote 8, p. 47. 
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It follows that no such curve can exist, and hence the proof is 
complete. 

If we replace the condition that the functional (35) be positive definite 
by the condition that it be nonnegative for all admissible h(x ), we obtain 
the following result: 

Theorem 2'. If the quadratic functional 

f (Ph' 2 + Qh 2 ) dx (38) 

J a 

where 

P(x ) >0 (a ^ ^ b) 

is nonnegative for all h(x) such that h(a) = h(b) = 0, then the interval 
[< a , b ] contains no interior points conjugate to a. 12 

Proof If the functional (38) is nonnegative, the functional (36) is 
positive definite for all t except possibly t = 1. Thus, the proof of 
Theorem 2 remains valid, except for the use of the lemma to prove that 
a = b is impossible. Therefore, with the hypotheses of Theorem 2', 
the possibility that a = b is not excluded. 

Combining Theorems 1 and 2, we finally obtain 
Theorem 3. The quadratic functional 

f (P/z ' 2 + Qh 2 ) dx , 

Ja 

where 

P(x) >0 (a ^ ^ b), 

is positive definite for all h(x) such that h(a) = h(b) = 0 if and only if 
the interval [a, b] contains no points conjugate to a. 


27. Jacobi’s Necessary Condition. More on Conjugate Points 

We now apply the results obtained in the preceding section to the simplest 
variational problem, i.e., to the functional 

f F(x, y, /) dx (39) 

with the boundary conditions 

y{d) = A, y{b) = B. 


12 In other words, the solution of the equation 

-j x W) + Qh = o 

satisfying the initial conditions h(a) = 0, h'(a) = 1 does not vanish at any interior point 
of the interval [a, b]. 
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It will be recalled from Sec. 25 that the second variation of the functional 
(39) [in the neighborhood of some extremal y = >>(.*)] is given by 

f (Ph' 2 + Qh 2 ) dx (40) 

•J a 

where 

P = \F y . y ; Q =^ fyy --^F yy j (41) 

Definition 1 . The Euler equation 

- ±(Ph‘) + Qh = 0 (42) 

of the quadratic functional (40) is called the Jacobi equation of the original 
functional (39). 


Definition 2. The point a is said to be conjugate to the point a with 
respect to the functional (39) if it is conjugate to a with respect to the 
quadratic functional (40) which is the second variation of (39), i.e., if it 
is conjugate to a in the sense of the definition on p. 106. 


Theorem {Jacobi's necessary condition). If the extremal y = y(x) 
corresponds to a minimum of the junctional 

rb 

F(x, y, /) dx, 

J a 

and if 


Fyy > 0 

along this extremal , then the open interval {a, b) contains no points con¬ 
jugate to a. 13 


Proof In Sec. 24 it was proved that nonnegativity of the second 
variation is a necessary condition for a minimum. Moreover, according 
to Theorem 2' of Sec. 26, if the quadratic functional (40) is nonnegative, 
the interval {a, b) can contain no points conjugate to a. The theorem 
follows at once from these two facts taken together. 


We have just defined the Jacobi equation of the functional (39) as the Euler 
equation of the quadratic functional (40), which represents the second 
variation of (39). We can also derive Jacobi’s equation by the following 
argument: Given that y = j>(a) is an extremal, let us examine the conditions 
which have to be imposed on h{x) if the varied curves = y*{x) = y(x) -f h{x) 
is to be an extremal also. Substituting y{x) -f h{x) into Euler’s equation 


F y (x, y + h,y’ + h’) - 


dx 


Fy(x, y + h, y' + h') = 0, 


13 Of course, the theorem remains true if we replace the word “minimum” by 
“maximum” and the condition Fyy > 0 by F y y > < 0. 
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using Taylor’s formula, and bearing in mind that >>(*) is already a solution 
of Euler’s equation, we find that 

Fyyh + Fyy.h' ~ ^ (Fyy.fl + Fy’y’h') = 0(A), 

where o(h ) denotes an infinitesimal of order higher than 1 relative to h and 
its derivative. Neglecting o(h ) and combining terms, we obtain the linear 
differential equation 

(Fyy - -^Fyydh - -^(Fyyh’) = 0 ; 

this is just Jacobi’s equation, which we previously wrote in the form (42), 
using the notation (41). In other words, Jacobi's equation , except for infini¬ 
tesimals of order higher than 1 , is the differential equation satisfied by the 
difference between two neighboring ( i.e ., “infinitely close") extremals. An 
equation which is satisfied to within terms of the first order by the 
difference between two neighboring solutions of a given differential equation 
is called the variational equation (of the original differential equation). 
Thus, we have just proved that Jacobi’s equation is the variational equation of 
Euler’s equation. 

Remark. These considerations are easily extended to the case of an 
arbitrary differential equation 

F(x,y,= 0 (43) 

of order n. Let y(x) and y(x) + 8y(x) be two neighboring solutions of (43). 
Replacing y(x) by >>(•*) + §y(*) * n (43), using Taylor’s formula, and bearing 
in mind that y(x) satisfies (43), we obtain 

F y Sy + Fs(Sy)' + • • • + F y <»>(iy)™ + e = 0, 

where e denotes a remainder term, which is an infinitesimal of order higher 
than 1 relative to 8y and its derivatives. Retaining only terms of the first 
order, we obtain the linear differential equation 

Fyiy + Fy-Qy)' + ■ ■ • + />>( 8 ; 0 (B) = 0 , 

satisfied by the variation 8 y; as before, this equation is called the variational 
equation of the original equation (43). For initial conditions which are 
sufficiently close to zero, this equation defines a function which is the 
principal linear part of the difference between two neighboring solutions of 
(43) with neighboring initial conditions. 

We now return to the concept of a conjugate point. It will be recalled 
that in Sec. 26 the point a was said to be conjugate to the point a if h(a) = 0, 
where h(x) is a solution of Jacobi’s equation satisfying the initial conditions 
h(a) = 0, h\a) =1. As just shown, the difference z(x) = y*(x) — y(x) 
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corresponding to two neighboring extremals y = and y = j>*(x) drawn 
from the same initial point must satisfy the condition 

- ^ C Pz') + Qz = o(z), 

where o(z ) is an infinitesimal of order higher than 1 relative to z and its 
derivative. Hence, to within such an infinitesimal, _y*(x) — .y(x) is a nonzero 
solution of Jacobi’s equation. This leads to another definition of a con¬ 
jugate point : 14 

Definition 3. Given an extremal y = y(x ), the point M = (< a , y{aj) 
is said to be conjugate to the point M = (a, y(a)) if at M the difference 
>>*(*) — >>(*), where y = >>*(*) is any neighboring extremal drawn from 
the same initial point M, is an infinitesimal of order higher than 1 relative 
to ||j*(x) - X^)i!l- 

Still another definition of a conjugate point is possible: 

Definition 4. Given an extremal y = ^(x), the point M = (a, y(a)) 
is said to be conjugate to the point M = (a, y(a)) if M is the limit as 
|| >>*(*) — >>(.*) || !->0 of the points of intersection of y = y(x) and the 
neighboring extremals y = j*(x) drawn from the same initial point M. 

It is clear that if the point M is conjugate to the point M in the sense of 
Definition 4 (i.e., if the extremals intersect in the way described), then M is 
also conjugate to M in the sense of Definition 3. We now verify that the 
converse is true, thereby establishing the equivalence of Definitions 3 and 4. 
Thus, let y = >>(*) be the extremal under consideration, satisfying the initial 
condition 

y(a) = A, 

and let y%(x) be the extremal drawn from the same initial point M = ( a , A), 
satisfying the condition 

yi\a) - y\a) = «• 

Then can be represented in the form 

j£0) = y(x) + *h(x) + e, 

where h(x) is a solution of the appropriate Jacobi equation, satisfying the 
conditions 

Ka) = 0, h\a) = 1, 

and e is a quantity of order higher than 1 relative to a. 

Now let 

m = o, p = 


14 In stating this definition, we enlarge the meaning of a conjugate point to apply 
to points lying on an extremal and not just their abscissas. In all these considerations, 
it is tacitly assumed that P = \Fy y > has constant sign along the given extremal y = y(*). 
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It is clear that h\a) ^ 0, since h(x ) ^ 0. Using Taylor’s formula, we can 
easily verify that for sufficiently small a, the expression 

yi(x) - y(x) = «Mx) + e 

takes values with different signs at the points a — (3 and a + (3. Since 
P -> 0 as a -> 0, this means that M = (a, y(a)) is the limit as a -> 0 of the 
points of intersection of the extremals y = >>*(*) and the extremal y = y(x). 

Example . Consider the geodesics on a sphere, i.e., the great circle arcs. 
Each such arc is an extremal of the functional which gives arc length on the 
sphere. The conjugate of any point M on the sphere is the diametrically 
opposite point M. In fact, given an extremal, all extremals with the same 
initial point M (and not just the neighboring extremals) intersect the given 
extremal at M. This property stems from the fact that a sphere has con¬ 
stant curvature, and is no longer true if the sphere is replaced by a “neigh¬ 
boring” ellipsoid (for example). 

We conclude this section by summarizing the necessary conditions for an 
extremum found so far: If the functional 

f F(x , y , /) dx , y(a) = A, y(b ) = B 

has a weak extremum for the curve y = >>(*)» then 

1. The curve y = y(x) is an extremal, i.e., satisfies Euler’s equation 



(see Sec. 4); 

2. Along the curve y = j(x), F y ^ y > ^ 0 for a minimum and F y > y > ^ 0 for 
a maximum (see Sec. 25); 

3. The interval (a, b) contains no points conjugate to a (see Sec. 27). 


28. Sufficient Conditions for a Weak Extremum 

In this section, we formulate a set of conditions which is sufficient for a 
functional of the form 

J[y]=\ F{x,y,y')dx, y(a) = A, y(b) = B (44) 

to have a weak extremum for the curve y = y(x). It should be noted that 
the sufficient conditions to be given below closely resemble the necessary 
conditions given at the end of the preceding section. The necessary con¬ 
ditions were considered separately, since each of them is necessary by itself. 
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However, the sufficient conditions have to be considered as a set, since the 
presence of an extremum is assured only if all the conditions are satisfied 
simultaneously. 

Theorem. Suppose that for some admissible curve y = t(*)> the 
functional (44) satisfies the following conditions: 

1. The curve y = y(x) is an extremal , i.e. 9 satisfies Euler's equation 



2. Along the curve y = j(x), 

P(x) = \F y . y .[x, y(x), y\x)\ > 0 
(the strengthened Legendre condition ); 

3. The interval [a, b ] contains no points conjugate to the point a (the 
strengthened Jacobi condition ). 15 

Then the functional (44) has a weak minimum for y = t(*). 

Proof If the interval [a, b] contains no points conjugate to a , and if 
P(x) > 0 in [ia , b], then because of the continuity of the solution of 
Jacobi’s equation and of the function P(x), we can find a larger interval 
[a, b + z] which also contains no points conjugate to a, and such that 
P(x) > 0 in [a, b + e]. Consider the quadratic functional 

f (PA' 2 + Qh 2 ) dx - a 2 f h' 2 dx, (45) 

J a J a 

with the Euler equation 

- j- x l(P - * 2 W] + Qh = o. (46) 

Since P(x) is positive in [a , b + e] and hence has a positive (greatest) 
lower bound on this interval, and since the solution of (46) satisfying the 
initial conditions h(a) = 0 , h'( 0 ) = 1 depends continuously on the 
parameter a for all sufficiently small a, we have 

1 . P(x) — a 2 > 0 , a ^ x ^ b; 

2. The solution of (46) satisfying the boundary conditions h(a) = 0, 
h\a) = 1 does not vanish for a < x ^ b. 

As shown in Theorem 1 of Sec. 26, these two conditions imply that the 
quadratic functional (45) is positive definite for all sufficiently small a. 
In other words, there exists a positive number c > 0 such that 

f (Ph’ 2 + Qh 2 ) dx > c C h’ 2 dx. (47) 


15 The ordinary Jacobi condition states that the open interval ( a , b) contains no points 
conjugate to a. Cf. Jacobi’s necessary condition, p. 112. 
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It is now an easy consequence of (47) that a minimum is actually 
achieved for the given extremal. In fact, if y = j(x) is the extremal 
and y = >>(■*) + K x ) is a sufficiently close neighboring curve, then, 
according to formula (12) of Sec. 25, 

J[y + h] - Jfy] = f (Ph' 2 + Qh 2 ) dx + f &h 2 + y]/j' 2 ) dx 9 (48) 

Ja Ja 

where £(;c), •/)(*)-> 0 uniformly for a ^ x ^ b as ||/i || x -» 0. Moreover, 
using the Schwarz inequality, we have 

h 2 (x ) = ^ J h ' dx j ^ (x — a) J h' 2 dx ^ (x — a) J h' 2 dx , 


i.e., 



(& ~ *) 2 
2 



which implies that 


f (£// 2 + V ]/?' 2 ) 

J a 


dx 




(49) 


if |£(x)| ^ e, h(x)| ^ e. Since e > 0 can be chosen to be arbitrarily 
small, it follows from (47) and (49) that 


J[y + h] - J[y] = f (Ph ' 2 + Qh 2 ) dx + f &h 2 + yf 2 ) dx > 0 

Ja Ja 


for all sufficiently small ||/71| Therefore, the extremal y = >>(.*) 
actually corresponds to a weak minimum of the functional (44), in some 
sufficiently small neighborhood of y = >>(•*)• This proves the theorem, 
thereby establishing sufficient conditions for a weak extremum in the 
case of the “simplest” variational problem. 


29. Generalization to n Unknown Functions 

The concept of a conjugate point and the related Jacobi conditions can 
be generalized to the case where the functional under consideration depends 
on n functions j^iW,...j n (4 In this section we carry over to such 
functionals the definitions and results given earlier for functionals depend¬ 
ing on a single function. To keep the notation simple, we write 

J[y] = f F(x, y, y') dx ( 50 ) 

Ja 

as before, where now y denotes the ^-dimensional vector (y l9 .. ., y n ) and y' 
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the ^-dimensional vector £yi,..., y' n ) [cf. Sec. 20]. By the scalar product 
( y , z) of two vectors 

y = (ji, • • -,y n ), z = (zi-.z n ) 

we mean, as usual, the quantity 

O'. z) = JiZ! + • • • + y n z n . 

Whenever the transition from the case of a single function to the case of n 
functions is straightforward, we shall omit details. 

29.1. The second variation. The Legendre condition. If the increment 
A J[h] of the functional (50), corresponding to the change from y to y + A , 16 
can be written in the form 

A m = + 4H\ 

where 91 [A] is a linear functional, 9 2 [A] is a quadratic functional, and s->0 
as || A || -> 0, then cp 2 [A] is called the second variation of the original functional 
(50) and is denoted by 8 2 /[A ]. 17 In the case of fixed end points, where 

A,(a) = A,(A) = 0 (/= 1 ,...,*), 

or more concisely, 

h(a) = h(b) = 0, 

we easily find, applying Taylor’s formula, that the second variation of (50) 
is given by 

*V[A] = \ f \ t + 2 J F yiy -J h h’ k + 2 F y[yk h[h' k ] dx. (51) 

L Ja L i^i uk=1 Uk=1 J 

Introducing the matrices 

Fyy = ll^i/ii/fcll> Fyy' = || ^3/i3/*c II» ^Vv' = (52) 

we can write (51) in the compact form 

* 2 m = \ £ [(F yy h, h) + 2{F yy .h, h') + (. F y . y .h\ h')] dx, (53) 

where each term in the integrand is the scalar product of the vector A or A' 
and the vector obtained by applying one of the matrices (52) to A or A'. 
Then, integrating by parts, we can reduce (53) to the form 

f [CPA', A') + (0A, A)] dx 9 (54) 

Ja 


16 The letter h denotes the vector (h iy ...A n ), and ||A|| means 

y max {|Aj(x)| + |A[(x)|} = T |A,||i. 

t= 1 a^x<b t = 1 

17 Obviously, <pi[A] is the (first) variation of the functional (50). 
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where P = P(x ) and Q = Q(x ) are the matrices 

P = ||P ffc || = \ F y . y ; Q = II Q ik \\ = \ - A F yy ^- 

In deriving (54), we assume that F UV ' is a symmetric matrix, 18 i.e., that 
Fv k v — Fy.y f for all /, k = 1, . . . , n (F yy and F y > y > are automatically sym¬ 
metric, because of the tacitly assumed smoothness of F). Just as in the case 
of one unknown function, it is easily verified that the term (Ph\ h ') makes 
the “main contribution” to the quadratic functional (54). More precisely, 
we have the following result: 

Theorem 1. A necessary condition for the quadratic functional (54) to 
be nonnegative for all h(x) such that h(a) = h(b ) = 0 is that the matrix P 
be nonnegative definite . 19 

29.2. Investigation of the quadratic functional (54). As in Sec. 26, we can 

investigate the functional (54) without reference to the original functional 
(50), assuming, however, that P and Q are symmetric matrices. As before 
(see Sec. 26), we begin by writing the system of Euler equations 

- 4 2 P<A + 2 QM = 0 (k = 1,..«), (55) 

ax f=i f=i 

corresponding to the functional (54). The equations (55) can be written 
more concisely as ^ 

- Tx (Ph')+Qh = 0, (56) 

in terms of the matrices P and Q. 

Definition 1. Let 

h a) = (An, A 12 ,.. ., A ln ), 

A (2) = (A 2 i, h 2 2 ,..., h 2 n)) (57) 

F ^ = (A nl , h n 2 ,... , h nn ) 

be a set of n solutions of the system (55), where the Vth solution satisfies 
the initial conditions 20 

h ik (a) = 0 (&=l,...,rt) (58) 

and 

h'u(a) = 1, h ik (a ) = 0 (k ^ /). (59) 

18 Without this assumption, which is unnecessarily restrictive, equations (54) and (55) 
become more complicated, but it can be shown that Theorems 1 and 2 remain valid 
(H. Niemeyer, private communication). 

10 This is the appropriate multidimensional generalization of the Legendre condition 
(14), p. 103. The matrix P = PC*) is said to be nonnegative definite (positive definite) 
if the quadratic form n 

2 Pik(x)hi(x)h k (x) (a ^ x ^ b) 

i.kt 1 

is nonnegative (positive) for all x in [a, b ] and arbitrary h^x ),.. ., h n (x). 

20 Thus, the vectors h (i) (a) are the rows of the zero matrix of order n , and the vectors 
h {i) \a) are the rows of the unit matrix of order n. 
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Then the point a (^a) is said to be conjugate to the point a if the deter¬ 
minant 

hn(x) (*)••• h\ n Ml 

hi iM ^22 (•*)*•• ^2 nW 

(60) 

h n iW h n2 (x) • • • h nn Ml 

vanishes for x = a. 

Theorem 2. IfP is a positive definite symmetric matrix, and if the inter¬ 
val [a, b] contains no points conjugate to a, then the quadratic functional 
(54) is positive definite for all h(x) such that h(a) = h(b) — 0. 

Proof The proof of this theorem follows the same plan as the proof 
of Theorem 1 of Sec. 26. Let W be an arbitrary differentiable sym¬ 
metric matrix. Then 

0 = f j- ( Wh , h) dx = f (W'h, h) dx + 2 f (Wh, h') dx 

Ja ax Ja da 

for every vector h satisfying the boundary conditions (58). Therefore, 
we can add the expression 

(W'h, h) + 2(Wh,h') 

to the integrand of (54), obtaining 

C [( Ph', h') + 2 (Wh, h') + (Qh, h) + (W'h, h)\ dx, (61) 

Ja 

without changing the value of (54). 

We now try to select a matrix W such that the integrand of (61) is a 
perfect square. This will be the case if W is chosen to be a solution of 
the equation 21 

Q + W' = WP _1 W, (62) 

which we call the matrix Riccati equation (cf. p. 108). In fact, if we 
use (62), the integrand of (61) becomes 

(Ph', h') + 2(Wh, h') + (WP~ 1 Wh, h). (63) 

Since P is a positive definite symmetric matrix, the square root P 112 
exists, is itself positive definite and symmetric, and has the inverse 
P~ 1/2 . Therefore, we can write (63) as the “perfect square” 

(P ll2 h' + P~ ll2 Wh,P ll2 h' + P~ ll2 Wh). 

[Recall that if T is a symmetric matrix, (Ty, z) = (y, Tz) for any 
vectors y and z.] Repeating the argument given in the case of a scalar 
function h (see p. 107), we can show that 
_ P ll2 h' + P~ ll2 Wh 

21 It can be shown that this is compatible with W being symmetric, even when F yy ' 
fails to be symmetric and (62) is replaced by a more general equation (H. Niemeyer, 
private communication). 
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cannot vanish for all a; in [a, b ] unless h = 0. It follows that if the 
matrix Riccati equation (62) has a solution W defined on the whole 
interval [a, b], then, with this choice of W, the functional (61), and hence 
the functional (54), is positive definite. 

Thus, the proof of the theorem reduces to showing that the absence 
of points in [a , b] which are conjugate to a guarantees that (62) has a 
solution defined on the whole interval [a, b]. Making the substitution 

W= -PU'U- 1 (64) 

in (62), where U is a new unknown matrix [cf. (32)], we obtain the 
equation 


--^(PU’)+Ql/ = 0, (65) 

which is just the matrix form of equation (56). The solution of (65) 
satisfying the initial conditions 

U( 0) = 0, U'(0) = /, 

where 0 is the zero matrix and / the unit matrix of order n, is precisely 
the set of solutions (57) of the system (55) which satisfy the initial 
conditions (58) and (59) [cf. footnote 19, p. 119]. If [a, b] contains 
no points conjugate to a, we can show that (65) has a solution U(x ) 
whose determinant does not vanish anywhere in [a, b] 22 and then 
there exists a solution of (62), given by (64), which is defined on the 
whole interval [< a , b]. In other words, we can actually find a matrix W 
which converts the integrand of the functional (61) into a perfect square, 
in the way described. This completes the proof of the theorem. 

Next we show, as in Sec. 26, that the absence of points conjugate to a 
in the interval [ a , b] is not only sufficient but also necessary for the functional 
(53) to be positive definite. 

Lemma. If 

h{x) = (h^x),h n (x)) 

satisfies the system (55) and the boundary conditions 

h{a) = h(b) = 0, (66) 

then 

f m\ h’) + ( Qh, h)] dx = 0. 

da 


22 The fact that det P does not vanish in [a, b ] is tacitly assumed, but this is guaranteed 
by the positive definiteness of P (cf. footnote 9, p 108). 
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Proof. The lemma is an immediate consequence of the formula 

° = r (- Tx (ph ' }+Qh ’ h ) dx = r [{ph ’’ h>) + (Qh ’ h)] dx> 

which is obtained by integrating by parts and using (66). 

Theorem 3. If the quadratic functional 

f [{Ph\ h f ) + (Qh, A)] dx, (67) 

Ja 

where P is a positive definite symmetric matrix, is positive definite for 
all h(x) such that h(a) = h(b) = 0, then the interval [a, b\ contains no 
points conjugate to a. 

Proof The proof of this theorem follows the same plan as the proof 
of the corresponding theorem for the case of one unknown function 
(Theorem 2 of Sec. 26). We consider the positive definite quadratic 
functional 

f {t[(Ph f , h') + (Qh, h)] + (1 - t)(h f , h')} dx. (68) 

The system of Euler equations corresponding to (68) is 

-j x \t 2 PuM + (1 - ?)/>;] + t 2 QiA = 0 (k = 1,..., n) (69) 

[cf. (37)], which for t = 1 reduces to the system (55), and for t = 0 
reduces to the system 

= o (*= i ,...,*). 

Suppose the interval [a, b] contains a point a conjugate to a, i.e., suppose 
the determinant (60) vanishes for x = a. Then there exists a linear 
combination h(x) of the solutions (57) which is not identically zero such 
that h(a) = 0. Moreover, there exists a nontrivial solution h(x, t) of 
the system (69) which depends continuously on t and reduces to h(x) 
for t — 1. It is clear that a ^ b, since otherwise, according to the lemma, 
the positive definite functional (67) would vanish for h(x) ^ 0, which 
is impossible. The fact that a cannot be an interior point of [a, b ] is 
proved by the same kind of argument as used in Theorem 2 of Sec. 26, 
for the case of a scalar function h(x). Further details are left to the 
reader. 

Suppose now that we only require that the functional (67) be nonnegative. 
Then, by the same argument as used to prove Theorem 2' of Sec. 26, we have 

Theorem 3'. If the quadratic functional 

f [(Ph f , h') + (Qh, h)] dx, 
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where P is a positive definite symmetric matrix , is nonnegative for all h(x) 
such that h(a) = h(b) = 0, then the interval [< a , b ] contains no interior 
points conjugate to a . 

Finally, combining Theorems 2 and 3, we obtain 
Theorem 4. The quadratic functional 

f [(Ph\ h') + (0/i, h)] dx, 

Ja 

where P is a positive definite symmetric matrix , is positive definite for all 
h(x) such that h(a) = h(b) = 0 if and only if the interval [< a , b] contains 
no point conjugate to a. 

29.3. Jacobi’s necessary condition. More on conjugate points. We now 

apply the results just obtained to the original functional 

J[y] = f F{x,y,y')dx, y{a) = M 0 , y(b) = M u (70) 

Ja 

where M 0 and M 1 are two fixed points, recalling that the second variation of 
(70) is given by 

\ b m',h') + (Qh 9 h)\dx, (71) 

Ja 

where 

P = \ Py'y'y Q = Fyy ~ ^ )' (72) 

Definition 2. The system of Euler equations 

-4z 2 + 2 QiA = 0 (k = 1,.. n), 

ax i=l i =1 

or more concisely 

--^(Ph')+Qh = 0, (73) 

of the quadratic functional (71) is called the Jacobi system of the original 
functional (70). 23 

Definition 3. The point a is said to be conjugate to the point a with 
respect to the functional (70) if it is conjugate to a with respect to the 
quadratic functional (71) which is the second variation of the functional 
(70), i.e., if it is conjugate to a in the sense of Definition \,p.\ 19. 

Since nonnegativity of the second variation is a necessary condition for 
the functional (70) to have a minimum (see Theorem 1 of Sec. 24), Theorem 3' 
immediately implies 


28 Equations (70)-(73) closely resemble equations (39)-(42) of Sec. 27, except that 
h , h' are now vectors, and P, Q are now matrices. 
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Theorem 5 {Jacobi's necessary condition). If the extremal 
yi = JlW, • . ., Jn = JnW 

corresponds to a minimum of the functional (70), and if the matrix 

Fy v \x, y(x), /(*)] 

is positive definite along this extremal , then the open interval {a, b) contains 
no points conjugate to a. 

So far, we have said that the point a is conjugate to a if the determinant 
formed from n linearly independent solutions of the Jacobi system, satisfying 
certain initial conditions, vanishes for x = a. As in the case n = 1, this 
basic definition is equivalent to two others, which involve only extremals of 
the functional (70), and not solutions of the Jacobi system: 

Definition 4. Suppose n neighboring extremals 

yi = yn(x),. .., y n = y in (x) O' = l,. . . , «) 

start from the same n-dimensional point , with directions which are close 
together but linearly independent. Then the point a is said to be conjugate 
to the point a if the value of the determinant 



yi2(x) ■ 

• • J’lnW 

J2l(*) 

J 22 W • 

■ • y 2 n(x) 

y m(*) 

y n 2 (x) • 

■ ■ y n n(x) 


for x = a is an infinitesimal whose order is higher than that of its values 
for a < x < a. 

In the next definition, we enlarge the meaning of a conjugate point to 
apply to points lying on extremals (cf. footnote 14, p. 114). 

Definition 5. Given an extremal y with equations 

yi = yi(x), ...,y n = y n (x), 

the point 

M = (a, ji(a),..., y n (a)) 
is said to be conjugate to the point 

M = {a, y^a),y n (a)) 

if y has a sequence of neighboring extremals drawn from the same initial 
point M , such that each neighboring extremal intersects y and the points 
of intersection have M as their limit. 

The equivalence of all these definitions of a conjugate point is proved by 
using considerations similar to those given for the case of a single unknown 
function (see Sec. 27). 
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29.4. Sufficient conditions for a weak extremum. Theorem 2 and an 
argument like that used to prove the corresponding theorem of Sec. 28 
(for the scalar case) imply 

Theorem 6. Suppose that for some admissible curve y with equations 
yi = yi(x),...,y n = y n (x), 
the functional (70) satisfies the following conditions : 

1. The curve y is an extremal , i.e., satisfies the system of Euler equations 

^,-4^.' = ° 0 = i, •••.»); 

2. Along y the matrix 

P{x) = \Fy y \x, y(x),y'(x)\ 
is positive definite ; 

3. The interval [a, b] contains no points conjugate to the point a. 

Then the functional (70) has a weak minimum for the curve y. 

30. Connection between Jacobi’s Condition and the Theory of 
Quadratic Forms 24 

According to Theorem 3 of Sec. 26, the quadratic functional 

f (. Ph ' 2 + Qh 2 ) dx , (74) 

J a 

where 

P(x) >0 (a ^ x ^ b), 

is positive definite for all h(x) such that h(a) = h(b) = 0 if and only if the 
interval [a, b] contains no points conjugate to a , 25 The functional (74) 
is the infinite-dimensional analog of a quadratic form. Therefore, to obtain 
conditions for (74) to be positive definite, it is natural to start from the 
conditions for a quadratic form defined on an ^-dimensional space to be 
positive definite, and then take the limit as n -> oo. 

This may be done as follows: By introducing the points 

a — Xq, Ay,.. x n , x n +1 = b , 

we divide the interval [a, b] into n + 1 equal parts of length 
Ax = x t + 1 - x, = (i = 0, 1 ,..., n). 

24 Like Sec. 29, this section is written in a somewhat more concise style than the rest of 
the book, and can be omitted without loss of continuity. 

26 This is the strengthened Jacobi condition (see p. 116). 
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(75) 

where P i9 Q t and hi are the values of the functions P(x ), Q(x ) and h(x ) at 
the point x = x { . This quadratic form is a “finite-dimensional approxi¬ 
mation” to the functional (74). Grouping similar terms and bearing in 
mind that 

h 0 = h(a) = 0, h n + 1 = h(b) = 0, 

we can write (75) as 

< 76 > 


In other words, the quadratic functional (74) can be approximated by a 
quadratic form in n variables h l9 ..., h n , with the n x n matrix 


where 


and 


Oi 


0 

... 0 

0 

0 



b 2 

... o 

0 

0 

0 

^2 

a 3 

• •• 0 

0 

0 

0 

0 

0 

■ ■ ■ b n -2 


bn -1 

0 

0 

0 

... 0 

bn — 1 

a n 


a t = Qi^x + Pt -^ Pi 

<'->. 


(77) 


(78) 

(79) 


A symmetric matrix like (77), all of whose elements vanish except those 
appearing on the principal diagonal and on the two adjoining diagonals, 
is called a Jacobi matrix , and a quadratic form with such a matrix is called 
a Jacobi form. For any Jacobi matrix, there is a recurrence relation between 
the descending principal minors , i.e., between the determinants 


ai 

b i 

0 ... 

0 

0 

0 

h 

#2 

b 2 ... 

0 

0 

0 

0 

*2 

a 3 ... 

0 

0 

0 

0 

0 

0 ••• 

bi-2 

<*i-l 


0 

0 

0 ••• 

0 

V1 

a t 


(80) 
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where / = 1,.. n. In fact, expanding D { with respect to the elements of 
the last row, we obtain the recursion relation 

D { = a i D i _ 1 — bf- 1 D i _ 2 9 (81) 

which allows us to determine the minors Z> 3 ,..., D n in terms of the first two 
minors D 1 and Z> 2 . Moreover, if we set D 0 = 1, D_ 1 = 0, then (81) is 
valid for all i = 1,..., n , and uniquely determines D u ..., D n . 

According to a familiar result, sometimes called the Sylvester criterion , 
a quadratic form 


n 



a ik l£ k 


( a ki ~ a ik) 


is positive definite if and only if the descending principal minors 


a n, 





011 

012 

013 

011 

012 







> 

021 

022 

023 

021 

022 







031 

032 

033 


of the matrix \\a ik \\ are all positive. 26 Applied to the present problem, 
this criterion states that the Jacobi form (76), with matrix (77), is positive 
definite if and only if all the quantities defined by (81) are positive, where 
i = 1,..., n and D 0 = 1, D_ 1 = 0. 

We now use this result to obtain a criterion for the quadratic functional 
(74) to be positive definite. Thus, we examine what happens to the recur¬ 
rence relation (81) as oo. Substituting for the coefficients a t and b x 
from (78) and (79), we can write (81) in the form 


A = (g t A* + ~ ^2 A -2 (/ = 1, .. n). (82) 

It is obviously impossible to pass directly to the limit n-> oo (i.e., Ax-> 0) 
in (82), since then the coefficients of D i _ 1 and D { _ 2 become infinite. To 
avoid this difficulty, we make the “change of variables” 27 

(i=1 .">• 

D -~h~ u <83) 

D-i = Zq = 0 . 


26 See e.g., G. E. Shilov, op. cit ., Theorem 27, p. 131. 

27 Substituting the expressions (78) and (79) into (80), we find by direct calculation 
that Di is of order (Ax) -1 , and hence that Z K is of order Ax. 
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In terms of the variables Z i? the recurrence relation (82) becomes 


+ i 


(A*) 


PiZ, + 1 ( 

i‘ +1 \ 


Qi Ax + 


•Pi-1 + PA Pi • ‘ ‘ Pi-lZi 


Ax 




(Ax)' 


P?-i Pi • • • Pi- 2 Z,. 


i.e.. 


or 




(Ax) 2 (Ax)*" 1 

2iZj(Ax) 2 + P i ^ 1 Z i + PjZj — P t Z i + 1 — P i _ 1 Z i _ 1 = 0 

1 ( n z i + i — Z t n Zj — Zj 


~ P -> ±L ^) - 0 


Passing to the limit Ax -> 0 in (84), we obtain the differential equation 


--(PZ')+ QZ = 0, 


(85) 


which is just the Jacobi equation! 

The condition that the quantities D { satisfying the relation (82) be positive 
is equivalent to the condition that the quantities Z* satisfying the difference 
equation (84) be positive, since the factor 

Pi - Pi 

(Ax) i + 1 

is always positive [because of the condition P(x) > 0]. Thus, we have proved 
that the quadratic form (76) is positive definite if and only if all but the first 
of the n + 2 quantities Z 0 , Z u ..Z n + 1 satisfying the difference equation 
(84) are positive. 2 * 

If we consider the polygonal line n n with vertices 

(( 2 , Z 0 ), (x 1? Z x ), • ••, ( b,Z n + 1 ) 

recall that a = x 0 , b = x n + 1 ), the condition that Z 0 = 0 and Z t > 0 for 
/ = 1,..rt + 1 means that n n does not intersect the interval [< a , 6] except 
at the end point #. As Ax -> 0, the difference equation (84) goes into 
the Jacobi differential equation (85), and the polygonal line n n goes into a 
nontrivial solution of (85) which satisfies the initial condition 

Z(a) = Z 0 = 0, Z\a) = lim Zl ~ Z ° = lim ^ = 1 

Ax->0 AX Ax->0 AX 

and does not vanish for a < x ^ b. In other words, as n-> oo, the Jacobi 
form (76) goes into the quadratic functional (74), and the condition that (76) 


28 Note that Z 0 = 0, Zi = Ax > 0, according to (83). Note also that these two 
equations, together with the n equations (84), form a system of n + 2 independent 
linear equations in n + 2 unknowns, and that such a system always has a unique 
solution. 
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be positive definite goes into precisely the condition for (74) to be positive 
definite given in Theorem 3 of Sec. 26, i.e., the condition that [a, b ] contain 
no points conjugate to a. The legitimacy of this passage to the limit can 
be made completely rigorous, but we omit the details. 

PROBLEMS 

1. Calculate the second variation of each of the following functionals: 

a) J[y] = £ Fix, y) dx; 

b) J[y]= f F(x, y, y n> ) dx; 

Ja 

c) J[u] = JJ^ F(x, Uy) dx dy. 

2. Show that the second variation of a linear functional is zero. State and 
prove a converse result. 

3. Prove that a quadratic functional is twice differentiable, and find its first 
and second variations. 

4. Calculate the second variation of the functional 

e nv\ 

where J[y] is a twice differentiable functional. 

Ans. SV [y] = [(S/) 2 + $ 2 J]e ny \ 

5. Give an example showing that in Theorem 2 of Sec. 24, we cannot replace 
the condition that $ 2 J[h] be strongly positive by the condition that 8 2 /[/i] > 0. 

6. Derive the analog of Legendre’s necessary condition for functionals of the 
form 

J[u] = j j R F(x 9 y, u , u Xi u y ) dx dy , 

where u vanishes on the boundary of R. 

Ans. The matrix 


F 

* UjUj 

F 1 

F 

1 UyU x 

F 

1 UyUy 1 


should be nonnegative definite (cf. p. 119). 

7. For which values of a and b is the quadratic functional 

£ [rxx) - bp(x)] dx 

nonnegative for all f(x) such that /(0) = f(a) = 0? Deduce an inequality 
from the answer. 

8. Show that the extremals of any functional of the form 

£ F(x, y') dx 


have no conjugate points. 
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9. Prove that if a family of extremals drawn from a given point A has an 
envelope E t then the point where a given extremal touches E is a conjugate 
point of A. 

10. Investigate the extremals of the functional 

J[y\ = J o yi dx, j(0) = 1 , y{a) = A, 

where 0 < a, 0 < A < 1. Show that two extremals go through every pair 
of points (0, 1) and ( a , A). Which of these two extremals corresponds to 
a weak minimum? 

Hint. The line x — 0 is an envelope of the family of extremals. 

11. Prove that the extremal y — yix/xx corresponds to a weak minimum of 
both functionals 

pi dx pi dx 

Jo y /f Jo y' 2 ’ 

where ^( 0 ) = 0 , y(x i) = y u x 1 > 0 , y x > 0 . 

12. What is the restriction on a if the functional 

j“ (/ 2 - y 2 ) dx, y(0) = 0, y(a) = 0 

is to satisfy the strengthened Jacobi condition? Use two approaches, one 
based on Jacobi’s equation (42) and the other based on Definition 4 (p. 114) 
of a conjugate point. 

13. Is the strengthened Jacobi condition satisfied by the functional 

J\y] = J o O' 2 + y 2 + X 2 ) dx, y(0) = 0, y(a) = 0 

for arbitrary al 
Ans. Yes. 

14. Let 7 = y(x , a, p) be a general solution of Euler’s equation, depending on 
two parameters a and p. Prove that if the ratio 

dy/dcc 

dy/dp 

is the same at two points, the points are conjugate. 

15. Consider the catenary 

y — c cosh 

where b and c are constants. Show that any point on the catenary except 
the vertex ( — b, c) has one and only one conjugate, and show that the tangents 
to any pair of conjugate points intersect on the *-axis. 
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FIELDS. 

SUFFICIENT CONDITIONS 
FOR A STRONG EXTREMUM 


In our study of sufficient conditions for a weak extremum, we introduced 
the important concept of a conjugate point. The simplest and most natural 
way to introduce this concept is based on the use of families of neighboring 
extremals (see Sec. 27). Then the conjugate of a point M lying on an extremal 
y is defined as the limit of the points of intersection of y with the neighboring 
extremals drawn from M. 

The utility of studying families of extremals rather than individual extremals 
is particularly apparent when we turn our attention to the problem of finding 
sufficient conditions for a strong extremum. The study of such families of 
extremals is intimately connected with the important concept of a field , 
which we introduce in the next section. Since the concept of a field is 
useful in many problems, we first give a general definition of a field, which is 
not directly related to variational problems. 


31. Consistent Boundary Conditions. General Definition 
of a Field 

Consider a system of second-order differential equations 

y\ = fi(x, y l9 ..., y n9 y' l9 ... 9 y' n ) (/ = 1,..., n) 9 (1) 

solved explicitly for the second derivatives. In order to single out a definite 

131 
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solution of this system, we have to specify 2 n conditions, e.g., boundary 
conditions of the form 

y'i = '1'iO'i. • • -,y n ) O' = 1, • • n) (2) 

for two values of x, say x x and x 2 . Boundary conditions of this kind are 
commonly encountered in variational problems. If we require that the 
boundary conditions (2) hold only at one point, they determine a solution 
of the system (1) which depends on n parameters. 

We now introduce the following definitions: 

Definition 1. The boundary conditions 

y'i = V'Xyi’ ■ ■ ■ . yn) (/ = 1 ,...,(3) 
prescribed for x = x l9 and the boundary conditions 

y’i = WXyu • • -,y n ) 0 = 1 ,•••,«), (4) 

prescribed for x = x 2 , are said to be (mutually) consistent if every solution 
of the system (1) satisfying the boundary conditions (3) at x = x 1 also 
satisfies the boundary conditions (4) at x = x 2 , and conversely. 1 

Definition 2. Suppose the boundary conditions 

y[ = Mx, yi,---,yJ 0 = !,•••»«) (5) 

(where the 4»t are continuously differentiable functions) are prescribed for 
every x in the interval [< a , b], and suppose they are consistent for every 
pair of points x l9 x 2 in [< a , b]. Then the family of mutually consistent 
boundary conditions (5) is called a field (of directions) for the given 
system (1). 

As is clear from (5), boundary conditions prescribed for every value of x 
define a system of first-order differential equations. The requirement that 
the boundary conditions be consistent for different values of x means that 
the solutions of the system (5) must also satisfy the system (1), i.e., that (1) 
is implied by (5). 

Because of the existence and uniqueness theorem for systems of differential 
equations, 2 one and only one integral curve of the system (5) passes through 


1 Thus, one might say that the boundary conditions at can be replaced by the bound¬ 
ary conditions at x 2 which are consistent with those at Xi. In a boundary value 
problem, the boundary conditions represent the influence of the external medium. 
But in every concrete problem, we are at liberty to decide what is taken to be the external 
medium and what is taken to be the system under consideration. For example, in 
studying a vibrating string, subject to certain boundary conditions at its end points, 
we can focus our attention on a part of the string, instead of the whole string, regarding 
the rest of the string as part of the external medium and replacing the effect of the 
“discarded” part of the string by suitable boundary conditions at the end points of the 
“retained” part of the string. 

2 See e.g., E. A. Coddington, op. cit.. Chap. 6. 
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each point (.x, y l9 ..., >> n ) of the region R where the functions y l9 ..y n ) 
are defined. According to what has just been said, each of these curves is 
at the same time a solution of the system (1). Thus, specifying a field (5) 
of the system (1) in some region R defines an ^-parameter family of solutions 
of (1), such that one and only one curve from the family passes through each 
point of R. The curves of the family will be called trajectories of the field. 3 

The following theorem gives conditions which must be satisfied by the 
functions y l9 ..., y n ), 1 ^ ^ n 9 if the system (5) is to be a field 

for the system (1): 

Theorem. The first-order system 

y'i = yi,---,y n ) (a ^ x ^ b; l ^ i ^ n) (6) 

is a field for the second-order system 

y\ = f(x 9 yi ,..., y n , y'i, • • •, y' n ) (7) 

if and only if the functions ^(x, y l9 . .., y n ) satisfy the following system 
of partial differential equations , called the Hamilton-Jacobi system 4 for 
the original system (7): 


^ W- (8) 

Thus, every solution of the Hamilton-Jacobi system (8) gives a field for 
the original system (7) 

Proof Differentiating (6) with respect to x, we obtain 


dtyi ^ dy k 




i.e., 


dx dx 


- ~x + 


Thus, the system (7) is a consequence of the system (6) if and only if 
(8) holds. 

Example 1. Consider a single linear differential equation 

/' = P(x)y. (9) 


3 A field is usually defined not as a family of boundary conditions which are compatible 
at every two points, but as a set of integral curves of the system (1) which satisfy the 
conditions (5) at every point, i.e., as a general solution of the system (5). However, 
it seems to us that our definition has certain advantages, in particular, when applying 
the concept of a field to variational problems involving multiple integrals. 

4 For an explanation of the connection between the system (8) and the Hamilton- 
Jacobi equation defined in Chapter 4, see the remark on p. 143. 
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The corresponding Hamilton-Jacobi system reduces to a single equation 


i.e., 




5i+ 

dx + 2 dy PK )y ' 


( 10 ) 


The set of solutions of (10) depends on an arbitrary function, and according 
to the theorem, each of these solutions is a field for equation (9). 

The simplest solutions of (10) are those that are linear in y : 

<K*> y ) = *(x)y. (11) 

Substituting (11) into (10), we obtain 

x’(x)y + a 2 (x)y = p(x). 

Thus, a(x) satisfies the Riccati equation 

a'(*) + a 2 (x) = p(x). (12) 

Solving (12) and setting 

y' = *(*) j , 

we obtain a field (which is linear in y) for the differential equation (9). 

Example 2. In the same way, we can find the simplest field for a system 
of linear differential equations 

Y" = P(x)Y, (13) 

where Y = (yi 9 .-.,y n ) and P(x) = ||/? ifc (jc)|| is a matrix. The system of 
Hamilton-Jacobi equations corresponding to (13) is 

H = 1 Ptk(x)y k (/ (14) 


Let us look for a solution of (14) which is linear in Y , i.e., 


or in vector notation, 


yi . y n ) 


n 


2 *i k(x)y k , 

k = l 


T = AY. 


(15) 


Substituting (15) into (14), we obtain 

n n n n 

2 <*ik(x)y k + 2 «<*(*) 2 “wWl'i = 2 Pik(x)y k , 

k =1 k=l j =1 k=l 

or in matrix form 


^A(x)]y + A 2 (x)Y = P(x)Yr 
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where A = ||a ifc ||. Thus, if the matrix A(x) satisfies the equation 

Y x A ( x ) + A2 ( x ) = p ( x ), 

which it is natural to call a matrix Riccati equation (cf. p. 120), the functions 
(15) define a field for the system (13), and this field is linear in y. 

It is worth noting, although this observation will not be needed later, 
that the concept of a field is intimately related to the solution of boundary 
value problems for systems of second-order differential equations by the 
so-called “sweep method.” We illustrate this method by considering the 
very simple case where the system consists of a single linear differential 
equation 

y\x) = p(x)y(x) +/(*), (16) 

with the boundary conditions 

y\a) = c 0 y(a) + d 0 , 
y\b) = c x y{b) + d x . 

We begin by constructing the first-order differential equation 

/(*) = *(x)y(x) + rn (19) 

and requiring that all its solutions satisfy the boundary condition (17) and 
the original equation (16). Obviously, to meet the first requirement, we 
must set 

a (a) = c 0 , P(a) = d 0 . (20) 


(17) 

(18) 


To meet the second requirement, we differentiate (19), obtaining 

y"(x) = a '(x)Xx) + «(*)/(•*) + P'(■*■)• 

Substituting (19) for /(■*) in the right-hand side, we find that 

y"(x) = [a'(*) + a 2 (*)]j(*) + P'W + a(*)P(*). 
from which it is clear that (19) implies (16) if 

*'(*) +a 2 (x) = p(x), 

P'W + a(*)P(*) = fix). ^ ’ 

Now let a(x) and (3(x) be a solution of the system (21), satisfying the 
initial conditions (20). Once we have found a(x) and P(x), we can write a 
“boundary condition” 

/(*o) = K(x 0 )y(x 0 ) + p(x 0 ) 

for every point x 0 in [a, b]. This process of shifting the boundary condition 
originally prescribed for x = a over to every other point in the interval 
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[a, b] is called the “forward sweep.” In particular, setting x = b, we obtain 
the equation 

/(*) = <b)m + m, 

which, together with the boundary condition (18), forms a system determining 
y(b) and y\b). If these values are uniquely determined, our original boundary 
value problem has a unique solution, i.e., the solution of equation (19) which for 
x = b takes the value y(b) just found. This second stage in the solution of 
the boundary value problem is called the “ backward sweep.” These consider¬ 
ations apply to the case of a single equation, but a similar method can be 
used to deal with systems of second-order differential equations. 

The use of the sweep method to solve the boundary value problem con¬ 
sisting of the differential equation (16) and the boundary conditions (17) 
and (18) has decided advantages over the more traditional method. [In the 
latter method, we first find a general solution of equation (16) and then choose 
the values of the arbitrary constants appearing in this solution in such a way 
that the boundary conditions (17) and (18) are satisfied.] These advantages 
are particularly marked in cases where one must resort to some kind of 
approximate numerical method in order to solve the problem. 5 

The connection between the sweep method and the concept (introduced 
earlier) of the field of a system of second-order differential equations is now 
entirely clear. In fact, in the simple case just considered, the forward sweep 
is nothing but the construction of a field linear in y for equation (16). More¬ 
over, (21) is just the system of ordinary differential equations to which the 
Hamilton-Jacobi system reduces in the case where we are looking for a field 
linear in y of a single second-order differential equation. 6 

We might have constructed a field starting from the right-hand end point 
of the interval [a, b], rather than from the left-hand end point. Thus, our 
boundary value problem actually involves two fields for equation (16), 
one of which is determined by shifting the boundary condition (17) from a 
to b , and the other by shifting the boundary condition (18) from b to a. The 
solution of the boundary value problem consisting of the differential equation 
(16) and the boundary conditions (17) and (18) is a curve which is a common 
trajectory of these two fields. Thus, in the sweep method, we construct 
one field (the forward sweep) and then choose one of its trajectories which is 
simultaneously a trajectory of a second field (the backward sweep). 


5 I. S. Berezin and N. P. Zhidkov, Memabi BbiHHCJieHHH, Tom II (Computational 
Methods , Vol. //), Gos. Izd. Fiz.-Mat. Lit., Moscow (1959), Chap. 9, Sec. 9. 

6 In Example 1, we considered the even simpler homogeneous differential equation 
y" = p{x)y , and correspondingly, we looked for a field of the homogeneous form 
y' = a(jt )y. This led to the Riccati equation (12) for the function a(x), identical with 
the first of the equations (21). 
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32. The Field of a Functional 


32.1. We now apply the considerations of the preceding section to variational 
problems. The Euler equations 

Fyi ~lx Fy 1=0 (' !=1 ’- 


corresponding to the functional 
C b 

F(x, y u ..., y n , y{, ..., y' n ) dx, (22) 

da 

form a system of n second-order differential equations. In order to single 
out a definite solution of this system, we have to specify In supplementary 
conditions, which are usually given in the form of boundary conditions, i.e., 
relations connecting the values of y t and y[ at the end points of the interval 
[a, b ] (there are n such relations at each end point). In many cases, of 
course, the boundary conditions are determined by the very functional under 
consideration. For example, consider the variable end point problem for 
the functional 


F(x, y u ..., y n , y’ u ...,y' n )dx + g a \a, y^..., y n ) + g (2 \b, y u ..., y n ), 

da 

(23) 

differing from (22) by two functions g (1) and g (2) of the coordinates of the 
end points of the path along which the functional is considered. Calculating 
the variation of the functional (23), we obtain 



d \ JU x=b 

-rFy-Ahidx + ^ Fy^ 

/ t= 1 x = a 

+ 2 Sy^la) + 2 8m h lb). 

i = 1 i = 1 


(24) 


Setting (24) equal to zero, and assuming that the curve y t = 1 ^ ^ n, 

is an extremal, we find that 



+ 2 ZyM a ) + 2 S?Mb) = 0. 

i=1 


Since h^a) and h^b) are arbitrary, (25) implies that 


(25) 


and 

if g a) = g {2) 


{F m 

~ Olx=a = 0 

(/ = 1, . 

•., n) 

(26) 

{F yi 

- SL?)lx-„ = o 

0 = 1,. 


(27) 


= 0, (25) implies 
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i.e., the natural boundary conditions for a variable end point problem like 
the one considered in Sec. 6 [cf. Chap. 1, formula (29)]. 7 

Next, we examine in more detail the boundary conditions corresponding 
to one end point, say x — a. For simplicity, we write g instead of g a \ 
and adopt the vector notation 

y = (Ji.y»), / = Jn), 

etc., in arguments of functions (cf. Sec. 29). As usual, we introduce the 


“momenta” (see footnote 15, p. 86) 

Pi(x, y, y') = Fy-(x, y,y') (i = 1,.. n), (28) 

and then write the boundary conditions (26) in the form 

Pi(x, y, /)!*=„ = gy,(x, j)|x=a (/ = 1,, n). (29) 

The relations (28) determine ji(^)» • • • > y'n(d) as functions of yi(a ),..., y n (a): 6 

y'la) = Uy) lx=a (/= (30) 


Boundary conditions that can be derived in this way merit a special name: 
Definition 1 . Given a functional 

c b 

F{x,y,y')dx, 

Ja 

with momenta (28), the boundary conditions (30), prescribed for x — a, 
are said to be self-adjoint if there exists a function g(x, y) such that 

Pi[x,yA(y)]\x = a = gy t (x,y)\ x=a 0=1,...,«). (31) 

Theorem 1. The boundary conditions (30) are self-adjoint if and only 
if they satisfy the conditions 

dPi[x,yA(y)\ dPk{x,y,ty(y)] 

——...- w, —... (hk <32) 

called the self-adjointness conditions. 


7 It should also be noted that the boundary conditions corresponding to fixed end points 
can be regarded as a limiting case of the boundary conditions (26) and (27), although the 
latter involve the additional functions # (1) and g (2) . For example, in the case of the functional 

f* F(x, y, y') dx - k[y(a) - A] 2 , 

the boundary condition at the left-hand end point is 

[Fy'(x,y,y) ~ 2k(y - A)]\ x = a = 0 



If we now let k —> oo, we obtain in the limit the boundary condition y{a) = A. Similar 
considerations apply to the case of several functions y u . 

8 The conditions (30) can be thought of as assigning a direction to every point of the 
hyperplane x — a. [Cf. formula (2).] 
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Proof. If the boundary conditions (30) are self-adjoint, then (31) 
holds, and hence 

fyt[x, y, 40)] = S 2 g(x, y) = 8p k [x, y, 
dy k 8y t 8y k 8y, 

which is just (32). Conversely, if the boundary conditions (30) are such 
that the functions p t [x 9 y, ^(^)] satisfy (32), then, for x = a, the p t are 
the partial derivatives with respect to y t of some function g(y) 9 9 so that 
the boundary conditions (30) are self-adjoint in the sense of Definition 1. 

Remark. It is immediately clear that for n = 1, i.e., in the case of varia¬ 
tional problems involving a single unknown function, any boundary con¬ 
dition is self-adjoint, and in fact, the self-adjointness conditions (32) disappear 
for n = 1. 

32.2. In the preceding section, we introduced the concept of a field for a 
system of second-order differential equations. We now define the field of 
a functional: 

Definition 2. Given a functional 

f F(x 9 y 9 y f )dx 9 (33) 

Ja 

with the system of Euler equations 

F »‘~^ F ’' l = 0 O'= 4’•••>«)’ ( 34 ) 

we say that the boundary conditions 

y't-WKy) (35) 

prescribed for x = x l9 and the boundary conditions 

yl = Vi 2 ( (y) 0 = i> •••>«), (36) 

prescribed for x = jc 2 , are ( mutually ) consistent with respect to the 
functional (33) if they are consistent with respect to the system (34), i.e. 9 
if every extremal satisfying the boundary conditions (35) at x = x l9 
also satisfies the boundary conditions (36) at x = x 2 , and conversely. 

Definition 3. The family of boundary conditions 

y\ = y) (37) 


9 See e.g., D. V. Widder, op. cit. 9 Theorem 11, p. 251, and T. M. Apostol, Advanced 
Calculus , Addison-Wesley Publishing Co., Inc., Reading, Mass. (1957), Theorem 
10-48, p. 296. (We tacitly assume the required regularity of the functions p { and of 
their domain of definition.) 
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prescribed for every x in the interval [a, b\, is said to be a field of the 
functional (33) if 

1. The conditions (37) are self-adjoint for every x in [a, b\; 

2. The conditions (37) are consistent for every pair of points x l9 x 2 in 
[a, b]. 

In other words, by a field of the functional (33) is meant a field for the 
corresponding system of Euler equations (34) which satisfies the self¬ 
adjointness conditions at every point x. The equations (37) represent a 
system of first-order differential equations. Its general solution (the family 
of trajectories of the field) is an ^-parameter family of extremals such that 
one and only one extremal passes through each point ( x , y l9 ..., y n ) of the 
region where the field is defined. 10 

We now give an effective criterion for a given family of boundary con¬ 
ditions to be the field of a functional: 


Theorem 2. 11 A necessary and sufficient condition for the family of 
boundary conditions (37) to be a field of the functional (33) is that the 
self-adjointness conditions 

8pj[x, y, j>)] = 8p k [x, y, y)] 

fyk by, ( ’ 

and the consistency conditions 


bpAx, y, <Kx, y)] = _ bH[x, y, <K*, >03 

dx dy { 

be satisfied at every point x in [a, b], where 

Pi(x, y, y r ) = F y '(x, y, /), 


(39) 

(40) 


and H is the Hamiltonian corresponding to the functional (33): 


H(x,y,y') = 


- F(x, y, y') + 2 P<( x ’ y> y')y'i- 


(41) 


Proof We have already shown in Theorem 1 that the conditions (38) 
are necessary and sufficient for the boundary conditions 

y'i = M*,y) (/=!,...,«) (42) 


10 In the calculus of variations, by a field (of extremals) of a functional is usually 
meant an w-parameter family of extremals satisfying certain conditions, rather than a 
family of boundary conditions of the type just described. However, as already remarked 
(see footnote 3, p. 133), it seems to us that our somewhat different approach to the con¬ 
cept of a field has certain advantages. 

11 This theorem is the analog of the theorem of Sec. 31, and the system of partial 
differential equations (39) is the analog of the Hamilton-Jacobi system (seep. 133). 
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to be self-adjoint at every point x in [a, b]. Therefore, it only remains to 
show that if (38) holds at every point x in [a, b], then the conditions (39) 
are necessary and sufficient for the boundary conditions (42) to be con¬ 
sistent for a ^ x ^ b. To prove this, we set 

y\ = j). / = 'Kx, y) 


in (40) and (41), and substitute the right-hand sides of the resulting 
equations into (39). Performing the indicated differentiations and 
dropping arguments (to keep the notation concise), we obtain 


F 4- V F = F 4- V F 

*vt* Z, ^y[yk a p v . 


fyi 

_ v 4 dF y’k _ V F / 

A ^ 8y t y ' k 8y , 


Using the self-adjointness conditions 

Wyj _ 
dy x 8y k 


we can write (43) in the form 




Since 


d l± 

dy* 


— Fyivk + 2 


ViVj 


8% 
fyk 


dy* = j; 

dx 


so that 


d 2 y k _ . 

dx 2 dx 

Therefore, (45) reduces to 


(43) 


(44) 


(44) becomes 

F yi = F y;x + J W* + 2 F y J^ + 2 S 5 4 (45) 

fc= 1 fc= l \ j- 1 ^4; / 

Along the trajectories of the field, we have 


= F , + y F , f^ + V 

J 2/iX T- Z, 7 WiVfc ^ 

fc=l WA fc=l 


dx 2 


along the trajectories of the field, or 

d 
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where 1 ^ ^ n. This means that the trajectories of the field of 

directions (42) are extremals, i.e., (42) is a field of the functional 

f F(x, y, y') dx, (47) 

da 

and hence the conditions (39) are sufficient. Since the calculations 
leading from (39) to (46) are reversible, the conditions (39) are also 
necessary, and the theorem is proved. 

Theorem 3. The expression 

fyijx, y, /) _ dPk(x, y, /) 

dy k dy t 

has a constant value along each extremal . 

Proof. Using (46), we find that 

d_ Idjh _ tyA = dFy, _ My, = Q 
dx\8y k dy t J dy k 8y t 

Corollary. Suppose the boundary conditions 

y[ = y) (a x ^ b; l ^ i ^ n) 

are consistent , i.e., suppose the solutions of the system (49) are extremals 
of the functional (47). Then , to prove that the conditions (49) define a 
field of the functional (47), it is only necessary to verify that they are self- 
adjoint at a single {arbitrary) point in [a, b\. 

According to Definition 1, the boundary conditions (49) are self-adjoint 
if there exists a function g(x, y) such that 

Pi[x, y, <K*. jO] = gy,(x, y) (i = 1, • • •, n) (50) 

for a x ^ b. We now ask the following question: What condition has 
to be imposed on the function g(x, y) in order for the boundary conditions 
(49), defined by the relations (50), to be not only self-adjoint, but also 
consistent, at every point of [a, b\, i.e., for the boundary conditions (49) 
to be a field of the functional (47)? The answer is given by 

Theorem 4. The boundary conditions (49) defined by the relations (50) 
are consistent if and only if the function g{x, y) satisfies the Hamilton - 
Jacobi equation 12 

| + 0. (51) 


(48) 


(49) 


12 Cf. equation (72), p. 90. 
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Proof. It follows from (50) that the Hamilton-Jacobi equation (51) 
can be written in the form 

c)g 

-H{x,y 1 ,...,y n ,p 1 ,...,p n ), (52) 

where p t = p { [x, y 9 ^(x, >>)]. Differentiating (52) with respect to y i9 
we obtain 

8 2 g = dH[x, y u ..., y n , tp^x, y),..., <h(*, y)] 

dx dy { dy t 

i.e., 

fyi = SH[x, y u ..., y n , M*, y),..., <j> n (x, y )] 
dx dy t 

which is just the set of consistency conditions (39). 

Remark. The connection between the Hamilton-Jacobi system intro¬ 
duced in Sec. 31 and the Hamilton-Jacobi equation introduced in Sec. 23 is 
now apparent. As we saw in Sec. 31, in the case of an arbitrary system of 
n second-order differential equations, a field is a system of n first-order 
differential equations of the form (49), where the functions ^(x, y) satisfy 
the Hamilton-Jacobi system (8). When we deal with the field of a functional, 
the system (8) turns into the consistency conditions (39), and in this case, 
we impose the additional requirement that the boundary conditions defining 
the field be self-adjoint at every point. This means that the field of a 
functional is not really determined by n functions ^(x, y), but rather by 
a single function g(x , y) from which the functions ^(x, y) are derived by using 
the relations (50). In other words, the function g(x, y) is a kind of potential 
for the field of a functional. Since the field of a functional is determined by 
a single function, instead of by n functions, it is entirely natural that the set 
of n consistency conditions for such a field should reduce to a single equation, 
i.e., that the Hamilton-Jacobi system should be replaced by the Hamilton- 
Jacobi equation. 

32.3. Once more, we consider a functional 

f F(x, y, /) dx, (53) 

da 

whose extremals are curves in the (n + l)-dimensional space of points 
(x, y) = (x, y l9 ..., y n ). Let R be a simply connected region in this space, 
and let c = (c 0 , c u ..., c n ) be a point lying outside R. 

Definition 4. Let (x, y) be an arbitrary point of R, and suppose that 
one and only one extremal of the functional (53) leaves c and passes 
through (x, y), thereby defining a direction 

yl = W*, y) (i=\,...,n) (54) 

at every point of R. Then the field of directions (54) is called a central field. 
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Theorem 5. Every central field (54) is a field of the functional (53), 
i.e., satisfies the consistency and self-adjointness conditions. 

Proof Consider the function 

g(x> y) = f' V) F{x, y, /) dx, (55) 

Jc 

where the integral is taken along the extremal of (53) joining the point 
c to the point (x, y). We define a field of directions in R by setting 

Fyt(x, y, /) = Pi{x, y, /) = g m (x, y) (i = (56) 

The theorem will be proved if it can be shown that this field coincides with 
the original field (54), since then the original field will satisfy the consis¬ 
tency conditions [since its trajectories are extremals] and also the self¬ 
adjointness conditions [this follows from Theorem 1 applied to the field 
defined by (56)]. But (55) is just the function S(x, y l9 ..., y n ) of Sec. 23, 
and hence 

gy,(x, y) = Pt(x, y, z), 

where z denotes the slope of the extremal joining c to (x, y), evaluated 
at (x, j>). 13 This shows that the field of directions (56) actually coincides 
with the original field (54). 

Definition 5. Given an extremal y of the functional (53), suppose there 
exists a simply connected {open) region R containing y such that 

1. A field of the functional (53) covers R, i.e., is defined at every point 
of R; 

2. One of the trajectories of the field is y. 

Then we say that y can be imbedded in a field [of the functional (53)]. 
Theorem 6. Let y be an extremal of the functional (53), with equation 
y = yi.x) (a ^ x ^ b), 
in vector form. Moreover, suppose that 

det \\F vWk \\ 

is nonvanishing in [a, b ], and that no points conjugate to {a, y{aj) lie on y. 
Then y can be imbedded in a field. 

Proof. By hypothesis, the following two conditions are satisfied for 
sufficiently small e > 0: 

1. The extremal y can be extended onto the whole interval [a — e, b ]; 

2. The interval [a — e, b] contains no points conjugate to a (cf. foot¬ 
note 20, p. 121). 


13 See the second of the formulas (70) and footnote 18, p. 90. 
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Now consider the family of extremals leaving the point (a — e, y(a — e)). 
Since there are no points conjugate to a — e in the interval [a — e, b], it 
follows that for a ^ x ^ b no two extremals in this family which are 
sufficiently close to the original extremal y can intersect. Thus, in some 
region R containing y, the extremals sufficiently close to y define a central 
field in which y is imbedded. The proof is now completed by using 
Theorem 5. 


33. Hilbert’s Invariant Integral 

As before, let R be a simply connected region in the (n + l)-dimensional 
space of points (.x, y) = (x, y u .. ., y n ), and let 


Ti = tyfay) 0 = 


(57) 


define a field of the functional 

f y > /) dx (58) 

J a 

in R. It was proved in the preceding section (see Theorem 2) that the field 
of directions (57) is a field of the functional (58) if and only if the functions 
^i(x, y) satisfy the self-adjointness conditions 


<'Pi[x, y, .y)] = <Pk[x, y, 4<(jr, y)] 

<y>c tyi 

and the consistency conditions 

dH[x, y, = dp,[x, y, ^(.y, >')] 

<yi f'x 

Taken together, the conditions (59) and (60) imply that the quantity 


(59) 


(60) 


- H[x, y, <K*, y)} dx + ^ Px \ x , y, ^(.v, >■)] dy t 

i = 1 

is the exact differential of some function (see footnote 9, p. 139) 
g(x, y) = g(x, y u ..., y„). 

As is familiar from elementary analysis, 14 this function, which is determined 
to within an additive constant, can be written as a line integral 


g(x,y) = dx + 2 Pi dy (61) 

evaluated along the curve T going from some fixed point M 0 = (x 0 , y(x 0 )) to 
the variable point M = (x, y). Since the integrand of (61) is an exact 


14 See e.g., D. V. Widder, op. cit.. Theorem 12, p. 251. 
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differential, the choice of the curve T does not matter; in fact, the value of the 
integral depends only on the points M 0 , M u and not on the curve F. The 
right-hand side of (61) is known as Hilbert's invariant integral. 

Using the equations (57) defining the field, and explicitly introducing 
the integrand F of the functional (58), we can write the integral in (61) as 

J r [x, y, <Hx, j)] - 2 y) F y’X x > y> 

n 

+ 2 p y'X x ' •>’’ 

i = 1 

This expression is Hilbert’s invariant integral, in the form corresponding 
to the field defined by the functions ^(x, y). If the curve T along which the 
integral (62) is evaluated is one of the trajectories of the field, then 



dy { = y) dx 

along T, and hence (62) reduces to 

J r F(x , y, /) dx 
evaluated along this trajectory. 

Remark. If y is an extremal which is a trajectory of the field, Hilbert’s 
invariant integral can be used to write the value of the functional for this 
extremal as an integral evaluated along any curve joining the end points of y. 
This important fact will be used in the next section. 


34. The Weierstrass E-Function. Sufficient Conditions for a 
Strong Extremum 

Definition. By the Weierstrass E-function of the functional 15 

J[y] = f F(x, y, /) dx, y(a) = A , y(b) = B (63) 

we mean the following function of 3n + 1 variables : 

n 

E(x, y, z, w) = F(x, y, w) - F(x, y, z) - 2 ( M ’i ~ z d F 4 x , y, z). (64) 

i= 1 

In other words, E(x, y, z, w) is the difference between the value of the 


15 Here y{a) = A means yi{a) — A i,..., y n (a) = A n , and similarly for y{b) = B , 
i.e., we are dealing with the fixed end point problem. 
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function F (regarded as a function of its last n arguments) at the point w and 
the first two terms of its Taylor’s series expansion about the point z. Thus, 
E(x , y, z, w) can also be written as the remainder of a Taylor’s series: 

1 n 

E(x, y, z, w) = - ^ ( w i ~ z i)( H ’k ~ z k )F yly - k [x, y,z + 0(w - z)] 

Z i, k=l 

(0 < 0 < 1 ). 

For n = 1, the Weierstrass ^-function has a simple geometric interpretation, 
since if we regard F(x , y , z) as a function of z, 

F(x, y, w) - F(x 9 y, z) - (w - z)F y (x , y, z) 

is just the vertical distance from the curve T representing F(x , y , z) to the 
tangent to T drawn through a fixed point of T. 

Our goal in this section is to derive sufficient conditions for the functional 
(63) to have a strong extremum. It will be recalled from Secs. 28 and 29 
that the following set of conditions is sufficient for the functional (63) to have 
a weak minimum 16 for the admissible curve y: 

Condition 1. The curve y is an extremal; 

Condition 2. The matrix \F y y k || is positive definite along y; 

Condition 3. The interval [a, b] contains no points conjugate to a. 

Every strong extremum is simultaneously a weak extremum, but the 
converse is in general false (see p. 13). Therefore, in looking for sufficient 
conditions for a strong extremum, it is natural to assume from the outset 
that the three conditions just listed are satisfied. We then try to supplement 
them in such a way as to obtain a set of conditions guaranteeing a strong 
extremum as well as a weak extremum. To find such supplementary con¬ 
ditions, we first recall that Conditions 2 and 3 imply that the given extremal y 
can be imbedded in a field 

y'i = M*, y) (i=l,...,n) (65) 

of the functional (63) [see Theorem 6 of Sec. 32]. 17 Let y have the equations 
Ji=JiO) (/ = 1 ,.. n), 

and let y* be an arbitrary curve with the same end points as y, lying in the 
(n + l)-dimensional region R containing y and covered by the field (see 


16 To be explicit, we consider only conditions for a minimum. To obtain conditions 
for a maximum, we need only reverse the directions of all inequalities. 

17 The only part of Condition 2 that is used here is the fact that det ||F y ' fy jJ is non¬ 
vanishing (in fact, positive) in [< a , b\. 
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Definition 5 of Sec. 32). Then, according to equation (62) and the remark 
at the end of Sec. 33, we have 

J y f{x, y,/) = y> 'W - 2 y-f dx + 2 F v£ x > y- d y)’ 

( 66 ) 

where for simplicity we omit the arguments of the functions ^ and The 
right-hand side of (66) is just Hilbert’s invariant integral, in the form corre¬ 
sponding to the field (65). As usual, we are interested in the increment 

A J = f F(x, y, y') dx - f F{x, y, y') dx. 

Jy ♦ Jy 

Using (66), we find that 
A J = f F(x, y, y') dx 

Jy * 

~ f (f( x ’ y> ^ _ 2 ^ F Xx, y, + 2 Fy-Xx, y, <P) dy) 

= f (f(x, y, /) - F(x, y, <{/) - 2 W ~ 4>t)F„X X > y> ^)) dx, 
or in terms of the Weierstrass ^-function. 18 

Ad = f E(x, y, y') dx. (67) 

Jy* 

We are now in a position to state sufficient conditions for a strong 
extremum. 

Theorem 1. Let y be an extremal , and let 

y'i = H x ,y) (/ = !,...,«) ( 68 ) 

be a field of the functional 

J[y\ = f F(x, y, /) dx, y(a) = A, y(b) = B. (69) 

Ja 

Suppose that at every point (x, y) = (x, jy, . . ., y n ) of some (open) region 
containing y and covered by the field (68), 19 the condition 

E(x, y, h) ^ 0 (70) 

is satisfied for every finite vector w = (uy,. . ., vv n ). Then J[y] has a 
strong minimum for the extremal y. 


18 More explicitly. 

Ay = f E(x, y*, y*') dx , 

J a 

where y t = are the equations of the curve y*. 

19 By hypothesis, such a region R exists. 
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Proof. To say that the functional J[y ] has a strong minimum for the 
extremal y means that A J is nonnegative for any admissible curve y* 
which is sufficiently close to y in the norm of the space ^{a, b). But the 
condition (70) guarantees that the increment A/, given by (67), is non¬ 
negative for all such curves. Note that we do not impose any restrictions 
at all on the slope of the curve y*, i.e., y* need not be close to y in the 
norm of the space @i(a, b ). In fact, y* need not even belong to @i(a, b). 20 

Remark 1. As already noted, the hypothesis that the extremal y can be 
imbedded in a field can be replaced by Conditions 2 and 3. 

Remark 2. Since the Weierstrass ^-function can be written in the form 

1 n 

E(X, y, <y, tv) = X 2 ( w ’i - - 'W) F y,y k [*, J', 4 + 0(w - 40] 

L i,k=l 

(0 < 0 < 1) 

(see p. 147), we can replace (70) by the condition that at every point of some 
region R containing y, the matrix \\F y[y ' k (x, y, z)|| be nonnegative definite 
for every finite z. 

We conclude this section by indicating the following necessary condition 
for a strong extremum: 

Theorem 2 ( Weierstrass ’ necessary condition). If the functional 
J[y]=\ F(x,y,y')dx, y(a) = A, y(b) = B 

J a 

has a strong minimum for the extremal y, then 

E(x,y,y\ w) ^ 0 (71) 

along y for every finite w. 

The idea of the proof is the following: If (71) is not satisfied, there exists 
a point \ in [a, b] and a vector q such that 

E[l,y(f\yXi),q] < 0, (72) 

where y = y(x) is the equation of the extremal y. It can then be shown that 
a suitable modification of y leads to an admissible curve y* close to y in 
the norm of the space ^(< a , b ) such that 

A J = f F(x , /) dx - f F(x , y, y') dx < 0, (73) 

which contradicts the hypothesis the J[y] has a strong minimum for y. 
However, the construction of y* must be carried out carefully, since all we 
know is that (72) holds for a suitable q (see Probs. 9 and 10). 


20 In problems involving strong extrema of the functional (69), we allow broken 
extremals, i.e., the admissible curves need only be piecewise smooth (and satisfy the 
boundary conditions). 
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PROBLEMS 

1. Find the curve joining the points (-1, - 1 ) and ( 1 , 1 ) which minimizes 
the functional 

Jly] = (x 2 y' 2 + 12 y 2 )dx. 

What is the nature of the minimum? 

Hint. A/ = J[y + h] — J[y] = J* ( x 2 h' 2 + 12 h 2 ) dx > 0. 

Ans. J[y] has a strong minimum for y = x 3 . 

2. Find the curve joining the points (1, 3) and ( 2 , 5) which minimizes the func¬ 
tional 

Jly] = JV(1 + x 2 y’) dx. 

What is the nature of the minimum ? 

Hint. Again calculate AJ. 

3. Prove that the segment of the x-axis joining x = 0 to x = n corresponds to 
a weak minimum but not a strong minimum of the functional 

Jly ] = J o X(i - y' 2 ) dx, y( 0 ) = 0 , XX = 0 . 

Hint. Calculate J[y] for 

1 . 

v = — 7 = sin nx. 

Vn 

4. Prove that the extrema of the functional 

f n(x , y)V 1 + y' 2 dx 

Ja 

are always strong minima if n(x 9 y) > 0 for all x and y. 

5. Investigate the extrema of the following functionals: 

a) Jly] = + x 2 y')dx, ^(-1) = 1, y(2) = 1; 

b) /[>’] = j*‘ (4 y 2 - y' 2 + Sy) dx, y( 0) = -1, y( tc/4) = 0; 

c) JM = £ (x 2 y' 2 + 12 y 2 ) dx, XD = 1, X2) = 8; 

d) JM = £ (X 2 + X + 2 ye*) dx, X0) = i, XD = ie 2 - 

Ans. b) A strong maximum for y = sin 2x — 1; d) A strong minimum for 
y = ie 21 . 

6 . Prove that y = bx/a is a weak minimum but not a strong minimum of the 
functional 

Jly] = J o y' 3 dx, 

where y( 0 ) = 0 , y(a) = b, a > 0 , b > 0 . 

Hint. Examine the corresponding Weierstrass E’-function. 
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7. Show that the extremals which give weak minima in Chap. 5, Prob. 10 
do not give strong minima. 

8 . Show that the extremal y = 0 of the functional 

J[y] = £ (ay' 2 - 4byy' 3 + 2 bxy' 4 ) dx, 

where 

y( 0 ) = 0 , y( 1 ) = 0 , a > 0 , b > 0 , 


satisfies both the strengthened Legendre condition and Weierstrass’ necessary 
condition. Also verify that y = 0 can be imbedded in a field of the func¬ 
tional J[y]. Does y = 0 correspond to a strong minimum of /|>]? 

Hint. Choose 


y = M*) = 



for 

for 


0 ^ x ^ h, 
h ^ x ^ 1 . 


Then, given any k > 0 however small, there is an h > 0 such that/O 0 ] < 0. 
Ans. No. 


9. Complete the proof of Weierstrass’ necessary condition, begun on p. 149. 
Hint. By continuity of the iT-function, we can always arrange for the point 

E, to be an interior point of [a , b]. Choose h > 0 such that E, — h > a, and 
construct the function 

(y{x) + (x - a)Q for a ^ x ^ l - h, 

y = yn(x) = < (x - l)q + y(Q for l - h ^ ^ 5, 

l y(x) for 5 ^ x ^ b, 

where y — is the equation of the extremal y, and Q is the vector deter¬ 
mined by the condition 

y& - h) + a ~ a - h)Q = -qh + y{l). 

Then let A (h) = J[y h ] - J[y]. Prove that A'( 0 ) = E[t,y{l\y\l\q\ < 0, 
which, together with A(0) = 0, implies that J[y h ] - J[y] < 0 for small 
enough h. 

10. Give another proof of Weierstrass’ necessary condition, based on the 
direct use of Hilbert’s invariant integral. 

Hint. Let Mi be the point (5, y($)). From a point M 0 on y sufficiently 
close to M, construct a central field of the functional. Let R be the region 
covered by this field, and let O(M) be the value of Hilbert’s invariant integral 
evaluated along any curve in R joining M 0 to the variable point M in R. 
Draw two surfaces a 2 and a x of the one-parameter family O(M) = const, 
the first intersecting y in a point M 2 lying between M 0 and M u the second 
intersecting y in the point M 1 . Moreover, from M 1 draw the straight line 
with direction q , and let this line intersect a 2 in a point M 3 . Finally, let y* 
be obtained from y by replacing the part of y from M 0 to Mi by the curve 
M 0 M 3 Mi, where M 0 M 3 is the extremal from M 0 to M 3 and M 3 Mi is the 
straight line segment from M 3 to M x . Again using Hilbert’s invariant 
integral, prove that y* satisfies the inequality (72). 
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VARIATIONAL PROBLEMS 
INVOLVING 
MULTIPLE INTEGRALS 


In this chapter, we discuss a variety of topics pertaining to functionals 
which depend on functions of two or more variables. Such functionals 
arise, for example, in mechanical problems involving systems with infinitely 
many degrees of freedom (strings, membranes, etc.). In our treatment of 
systems consisting of a finite number of particles (see Chapter 4), we derived 
the principle of least action and a general method for obtaining conservation 
laws (Noether’s theorem). These methods will now be applied to systems 
with infinitely many degrees of freedom. 


35. Variation of a Functional Defined on a Fixed Region 
Consider the functional 

j[ u ] = | • J F(x l9 . . ., x n , u, u Xl ,. . ., u Xn ) dx i • • • dx n , (1) 

depending on n independent variables x u . . ., v n , an unknown function u 
of these variables, and the partial derivatives u Xl ,.. ., u Xn of u. (As usual, 
it is assumed that the integrand F has continuous first and second derivatives 
with respect to all its arguments.) We now calculate the variation of (1), 
assuming that the region R stays fixed, while the function w(*i, . ..,v n ) 
goes into 

. . . , X n ) = u(x u + S'X*l, • • •, *n) + 

152 


(2) 
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where the dots denote terms of order higher than 1 relative to e. By the 
variation 8/ of the functional (1), corresponding to the transformation (2), 
we mean the principal linear part (in z) of the difference 

J[u*] - J[u\. 

For simplicity, we write u(x), ^(.x) instead of u(x 1 ,. . ., x n ), • ••, x n ), 
dx instead of dx x • • • dx n , etc. Then, using Taylor’s theorem, we find that 

■/[«*] — -/[w] = f {F[x, u(x) + Z'l>(x), u lL (x) + ^(x),. .., u In (x) + e^„(x) 

* R 

- F[x, u(x), u Xl (x),u Xn (x)]} dx 

= e j R + 2 dx + • • •» 

where the dots again denote terms of order higher than 1 relative to e. It 
follows that 

87 = e £ (F u + |X,<k) dx (3) 

is the variation of the functional (1). 

Next, we try to represent the variation of the functional (1) as an integral 
of an expression of the form 

G(x)^(x) + div (• • •), 

i.e., we try to transform the expression (3) in such a way that the derivatives 
^ x . only appear in a combination of terms which can be written as a diver¬ 
gence. To achieve this, we replace 

in (3), obtaining 

"=* /, ( f - - 1 k '■-.)«*> * + ■ L k ix - <4) 

This expression for the variation 87 has the important feature that its second 
term is the integral of a divergence, and hence can be reduced to an integral 
over the boundary T of the region R. In fact, let da be the area of a variable 
element of T, regarded as an (n — l)-dimensional surface. Then the 
^-dimensional version of Green’s theorem states that 

| B 2 Jr dx = J r v ) d <*, ( 5 ) 

where 

G = (F Uti ,...,F u J 
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is the ^-dimensional vector whose components are the derivatives F Uxi , 
v = (v x , ...,v n ) is the unit outward normal to T, and (G, v) denotes the 
scalar product of G and v. Using (5), we can write (4) in the form 

sy = s £ (f u - 2 4 dx + e J r >K*)( G > v ) da, (6) 

where the integral over R no longer involves the derivatives of ^(x). 

In order for the functional (1) to have an extremum, we must require 
that 8/ = 0 for all admissible ^(x), in particular, that 8/ = 0 for all admissible 
which vanish on the boundary T. For such functions, (6) reduces to 

8j =U Fu ~%^ F ^ (x)dx ' 

and then, because of the arbitrariness of inside R, 8/ = 0 implies that 

^-i a 4^=o (?) 

t=i 

for all xe R. This is the Euler equation of the functional (1), and is the 
^-dimensional generalization of formula (24) of Sec. 5. 1 

Remark. In deriving (7), we assumed that the region of integration R 
appearing in the functional (1) is fixed. Generalization of (7) to the case 
where the region of integration is variable will be made in Sec. 36. 


36. Variational Derivation of the Equations of Motion of 
Continuous Mechanical Systems 

As we saw in Sec. 21, the equations of motion of a mechanical system 
consisting of n particles can be derived from the principle of least action , 
which states that the actual trajectory of the system in phase space mini¬ 
mizes the action functional 

per - u)dt, (8) 

where T is the kinetic energy and U the potential energy of the system of 
particles. We now use this principle, together with our basic formula for 
the first variation, to derive the equations of motion and the appropriate 
boundary conditions for some simple mechanical systems with infinitely 
many degrees of freedom, namely, the vibrating string, membrane and plate. 


1 As we shall see in the next section, boundary conditions for the equation (7) can be 
obtained by removing the restriction that x ) = 0 on T, and then setting 8J = 0 after 
substitution of (7) into (4) or (6). 
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36.1. The vibrating string. Consider the transverse motion of a string 
(i.e., a homogeneous flexible cord) of length / and linear mass density p. 
Suppose the ends of the string (at x = 0 and x = I) are fastened elastically , 
which means that if either end is displaced from its equilibrium position, 
a restoring force proportional to the displacement appears. This can be 
achieved, for example, by fastening the ends of the string to two rings which 
are constrained to move along two parallel rods, 
while the rings themselves are held in their initial 
positions by two ideal springs , 2 as shown in Fig. 8 . 

Let the equilibrium position of the string lie 
along the x-axis, and let w(x, t) denote the dis¬ 
placement of the string at the point x and time 
t from its equilibrium position. Then, at time t 9 
the kinetic energy of the element of string 
which initially lies between x 0 and x 0 + Ax is 
clearly 

i pu?(x 0 , t)Ax. (9) 

Integrating (9) from 0 to /, we find that the kinetic energy of the whole string 
at time t equals 

T= \? \ o u Kx,f)dx. (10) 

To find the potential energy of the string, we use the following argument: 
The potential energy of the string in the position described by the function 
u(x, t), where t is fixed, is just the work required to move the string from 
its equilibrium position u = 0 into the given position u(x, t). Let t denote 
the tension in the spring, and consider the element of string indicated by AB 
in Figure 9, which initially occupies the position DE along the x-axis, i.e., 
the interval [x 0 , x 0 + Ax ]. 3 To calculate the amount of work needed to move 
DE to AB , we first move DE to the position AC. This requires no work at 
all, since the force (the tension in the string) is perpendicular to the dis¬ 
placement . 4 Next, we stretch the string from the position A C to the position 
AC', where the length of AC' equals the length of AB. This obviously 
requires an amount of work equal to t(3, where p is the length of CC'. Finally, 
we rotate AC' about the point A into the final position AB. Like the first 
step, this requires no work at all, since at each stage of the rotation the 
force is perpendicular to the displacement. Thus, the total amount of work 


2 The springs are ideal in the sense that they have zero length when not stretched. 

3 Since we only consider the case of small vibrations, the string can be assumed to have 
constant length and constant tension. In the present approximation, we can also assume 
that AB is a straight line segment. 

4 It should be emphasized that since the string is assumed to be absolutely flexible, 
all the work is expended in stretching the string, and none in bending it. 



* = 0 x - L 

Figure 8 
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required to move DE to AB is just the product of t and the increase in 
length of the element of string, i.e., the quantity 

tV(Ax) 2 + (Aw) 2 - t Ax = ^ T \X^) A* + • • • =2 ™?(x 0 , t)Ax + ■ • 

GO 

where the dots indicate terms of order higher than those written (Au/Ax « 1 
for all t , since the vibrations are small). 



Integrating (11) from 0 to l h we find that the potential energy of the whole 
string is 

Ul = \. T lo ^ dx ' ( 12 ) 

except for the work expended in displacing the elastically fastened ends of 
the string from their equilibrium positions. This work equals 

U 2 = ^x lU \0, t) + jk 2 u 2 (I, t), (13) 

where x 1 and x 2 are positive constants (the elastic moduli of the springs). 
[In fact, the force/i acting on the end point P 1 (see Figure 8 ) is proportional 
to the displacement E, of P 1 from its equilibrium position .v = 0, w = 0, i.e., 

I/ll = (14) 

where > 0 is a constant; integration of (14) shows that the work required 
to move P 1 from (0, 0) to (0, w(0, /)), its position at time r, is given by 

/•U( 0. t) 1 

J o y-il dl = 2 x-iU 2 ( 0 , t), 

and similarly for the other end point P 2 .] Then, adding (12) and (13), we find 
that the total potential energy of the string in the position described by 
the function u(x , t) is 

U = U 1 4 - U 2 = J q u 2 (x, t) dx + ^XiW 2 (0, t) + ^x 2 w 2 (/, t). (15) 
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Finally, using (10) and (15), we write the action (8) for the vibrating string, 
obtaining the functional 


J[u] = ^ j t 1 J o [puf(x, t) - T u*(x, t )] dx dt 

-\x 1 f' 1 w 2 (0, t)dt-\ x 2 f 1 u\l, t) dt. 

I Jtn l J tn 


( 16 ) 


According to the principle of least action, 8J must vanish for the function 
u(x, t) which describes the actual motion of the string. Thus, we now 
calculate the variation 8.7 of the functional (16). Suppose we go from the 
function u(x, t) to the “varied” function 

w*(x, t) = u(x , t) + t) + 

Then, using formula (4) and the fact that the variation of a sum equals the 
sums of the variations of the separate terms, we find that 


8 J = 1 J o [~pu tt (x, 0 + ™ xx (x, 0 dx dt 

- I 1 w(0, t) 4>(0, 0 dt - x 2 f 1 w(/, OWA 0 dt\ 

J to Jt 0 J 

+ £ It 1 Jo Jx 0] dx dt 

+ £ 1 1 Jo Ft ^ dx dL 

If we assume that the admissible functions ((;(■*, 0 are such that 
ty(x, t 0 ) = 0, r x ) = 0 (0 ^ x ^ /), 


i.e., that u(x , t) is not varied at the initial and final times, then the last term 
in (17) vanishes and the next to the last term reduces to 

e f 1 [™ x (0, 0<K0, 0 - ™ x (l, 0W, 01 dt- 

Jto 

It follows that the variation (17) can be written in the form 


87 = e{ f 1 f [- pu tt + t u xx (x, 0]W*> 0 dx dt 

Uto Jo 

- f 1 [x lM (0, 0 - T Ml (0, omo, 0 dt (18) 

J to 

m/,o + ™ 1 (/,op.o4 

According to the principle of least action, the expression (18) must vanish 
for the function u(x , t) corresponding to the actual motion of the string. 
Suppose first that <J>(x, t) vanishes at the end of the string, 5 i.e., that 

WO, 0 = 0, W/,0 = 0 (to ^ t ^ /i). (19) 


5 If $J vanishes for all admissible /), it certainly vanishes for all admissible t ) 
satisfying the extra condition (19). 
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Then (18) reduces to just 

8 / = e f 1 f [- pu tt (x, t) + t u xx (x, /)] 0 dx dt. (20) 

Jto Jo 

Setting (20) equal to zero, and using the arbitrariness of the interval [ t 0 , t x \ 
and of the function t) for 0 < x < /, t 0 < t < t x (cf. the lemma of 
Sec. 5), we find that 

u tt {x, t) = a 2 u tz {x, t) (a 2 = ^ (21) 

for 0 ^ x ^ / and all t. This result, called the equation of the vibrating 
string , is the Euler equation of the functional 

\ f 1 f [Wf 2 (x, t) — tw 2 (x, 0] dx dt. 

1 Jt 0 Jo 

Next, we remove the restriction (19). Since u(x , t) must satisfy (21), the 
first term in (18) vanishes, and we have 

8 / = — s{J^ 1 [XiW(0, t) — tw-^O, 0 ]<KO, 0 dt 

+ J ( 1 [*2 u{l, t) + TU X (1, /)]'!(/, t) dt (22) 

This expression must also vanish for the function u(x 9 1 ) corresponding to the 
actual motion of the string. Since |y 0 , ^i] is arbitrary and ^(0, t), ^(/, t) are 
arbitrary admissible functions, equating (22) to zero leads to the relations 

Xiw(0, t) — tw z (0, t) — 0 (23) 

and 

x 2 w(/, t) + t u x (l, t) = 0 (24) 

for all t. Thus, finally, the function u{x , t) which describes the oscillations 
of the string must satisfy (21) and the boundary conditions 

««(0, t) + 14(0, 0 = 0 (a = (25) 

and 

p«(/, t) + u x (l, 0 = 0 (p = (26) 

which connect the displacement from equilibrium and the direction of the 
tangent at each end of the string. 

Next, suppose the ends of the string are free, which means that the springs 
shown in Fig. 8 are absent and the rings fastening the string to the lines 
x = 0, x = l can move up and down freely. Then x 1 = x 2 = 0, and the 
boundary conditions (23), (24) become 

w z (0, t ) = 0, w z (/, t) = 0. 
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Thus, at a free end point, the tangent to the string always preserves the same 
slope (zero) as it had in the equilibrium position. 

The case where the ends of the string are fixed, corresponding to the 
boundary conditions 

w(0, 0 = 0, w(/, 0 = 0, (27) 

can be regarded as a limit of the case of elastically fastened ends. In fact, 
let the stiffness of the springs binding the ends of the string to their initial 
positions increase without limit, i.e., let x 1 -> oo, x 2 -> oo. Then, dividing 
(23) by x x and (24) by x 2 , and taking this limit, we obtain the conditions (27). 


36.2. Least action vs. stationary action. The principle of least action is 
widely used not only in mechanics, but also in other branches of physics, 
e.g., in electrodynamics and field theory. However, as already noted (see 
Remark 2, p. 85), in a certain sense the principle is not quite true. For 
example, consider a simple harmonic oscillator , i.e., a particle of mass m 
oscillating about an equilibrium position under the action of an elastic 
restoring force (cf. Chap. 4, Prob. 2). The equation of motion of the par¬ 
ticle is 


with solution 


where 


mx + xx = 0, 


x = C sin (cot + 0), 



(28) 

(29) 


and the values of the constants C, 0 are determined from the initial conditions. 
Moreover, the particle has kinetic energy 


T = imx 2 

and potential energy 

U = }xx 2 , 

so that the action is 


\ f 1 (mx 2 — xx 2 ) dt. (30) 

2 Jto 

Equation (28) is the Euler equation of the functional (30), but in general 
we cannot assert that its solution (29) actually minimizes (30). In fact, 
consider the solution 


1 . 

x = — sm cot, 
co 


(31) 


which passes through the point x = 0, / = 0 and satisfies the condition 
x(0) = 1. The point (tz/co,0) is conjugate to the point (0,0), since every 
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extremal satisfying condition x(0) — 0 intersects the extremal (31) at (7r/c*>, 0) 
[see p. 114]. Since 

F i± = m > 0 

for the functional (30), the extremal (31) satisfies the sufficient conditions 
for a minimum (in fact, a strong minimum), provided that 

0 t < to < -• 
co 

However, if we consider time intervals greater than tz/co, we can no longer 
guarantee that the extremal (31) minimizes the functional (30). 

Next, consider a system of n coupled oscillators, with kinetic energy 

n 

T= 2 a ik x i x k (32) 

i. k= 1 

(a quadratic form in the velocities x t ) and potential energy 

n 

U = 2 b ilc X i X K ( 33 ) 

i.k = 1 

(a quadratic form in the coordinates x t ). The quadratic form (32) is positive 
definite (since it is a kinetic energy); therefore, (32) and (33) can be simul¬ 
taneously reduced to sums of squares by a suitable linear transformation 6 

n 

= 2 c ‘ kC/k = !>•••>”)> ( 34 ) 

k= 1 

i.e., substitution of (34) into (32) and (33) gives 

t = 2 v = 2 w. 

t=i t = i 

Then the equations of motion of the system of oscillators are given by the 
Euler equations 

7,(5) + i = * + = 0 <'-> .">• < 35 » 

corresponding to the action functional 

( 1 2 ~ dL 

Jt 0 i= 1 


6 See e.g., G. E. Shilov, op. cit Secs. 72 and 73. The coordinates q t are often called 
normal coordinates , and the corresponding frequencies co* are called natural frequencies. 
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Suppose all the X t are positive, which means that we are considering 
oscillations of the system about a position of stable equilibrium. Then 
the solution of the system (35) has the form 

q t = C t sin < 0 ^ + 0*) (/ = 1,..., n) 9 (36) 

where 

= Vx i9 


and the values of the constants C t , 0 ( are determined from the initial con¬ 
ditions. An argument like that made for the simple harmonic oscillator 
(n = 1) shows that a trajectory of the system [i.e., a curve given by (36) 
in a space of n + 1 dimensions] whose projection on the time axis is of length 
no greater than 7r/c*>, where 


to = max o) h 

l^i^n 

contains no conjugate points and satisfies the sufficient conditions for a 
minimum. However, just as before, we cannot guarantee that a trajectory 
whose projection on the time axis is of length greater than 7r/co actually 
minimizes the action. 

Finally, consider a vibrating string of length / with fixed ends. 7 As shown 
above, the function u(x , t) describing the oscillations of the string satisfies 
the equation 

u tt (x , t) = a 2 u xx (x , t) 


and the boundary conditions 

u( 0, t) = 0, w(/, t) = 0. 

It follows that 8 


where 


00 


“(*> 0=2 S * n + 0 k), 

k= 1 


W /c 


kan 

T 


(37) 


and C k (x), d k are determined from the initial conditions. Thus, in a certain 
sense, a vibrating string can be regarded as a system of infinitely many 
coupled oscillators, with natural frequencies (37). However, the numbers 
(37) have no finite upper bound, and hence the analogy with the case of n 
coupled oscillators leads us to believe that for a vibrating string, there is no 


7 Unlike the analysis of a system of n oscillators, the elementary argument that 
follows is meant to be heuristic rather than rigorous. 

8 See e.g., G. P. Tolstov, Fourier Series , translated by R. A. Silverman, Prentice-Hall, 
Inc., Englewood Cliffs, N. J. (1962), p. 271. 
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time interval short enough to guarantee that w(x, t) actually minimizes 
the action functional. Similar arguments can be carried out for other 
systems with infinitely many degrees of freedom. 

Guided by the above considerations, we shall henceforth replace the 
principle of least action by the principle of stationary action. In other words, 
the actual trajectory of a given mechanical system will not be required to 
minimize the action but only to cause its first variation to vanish. 

36.3. The vibrating membrane. Consider the transverse motion of a 
membrane (i.e., a homogeneous flexible sheet) of surface mass density p. 
Let u(x , y , t) denote the displacement from equilibrium of the point (x, y) of 
the membrane, at time t. The kinetic energy of the membrane at time t is 
given by 

T = \ p IL u ‘ ( ' x> y ’ ^ dx dy> ( 3s ) 

where R is the region of the xy-plane occupied by the membrane at rest. 
The potential energy of the membrane in the position described by the 
function w(x, y, t ), where t is fixed, is just the work required to move the 
membrane from its equilibrium position u = 0 into the given position 
w(x, y 9 1). This work is the sum of the work U 1 expended in deforming the 
membrane and the work U 2 expended in moving the boundary of the mem¬ 
brane, which we assume to be elastically fastened to its equilibrium position. 

To calculate U l9 let t denote the tension in the membrane, and consider the 
element AT of the membrane initially occupying the region x 0 ^ x ^ x 0 + Ax, 
yo ^ y ^ Jo + Aj. Then, just as in the case of the string, the work needed 
to deform A A equals the product of t and the increase in the area of AT 
under deformation, i.e., 

tV(Ax ) 2 + (Aw ) 2 V(Ay ) 2 + (Aw ) 2 — tAxAj 

= 2 t[«?(x 0 , Jo> t) + u$(x o, y 0 , 0] Ax Ay + ■■■, 

where the dots indicate terms of order higher than those written. Integrating 
(39) over R , we find that the work required to deform the whole membrane is 

U 1 = 5 T / [“?(*» y> 0 + u y( x > y> 0] dx dy. (40) 

To calculate U 2 , we generalize the argument used to derive (14). If T 
is the boundary of the region R , and s is arc length measured along T from 
some fixed point on T, then 

u 2 = \j r *) ds > 


( 41 ) 
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where u(s, t) is the displacement of the membrane from equilibrium at the 
point s and time t , and x(s) is the linear density of the elastic modulus of 
the forces retaining the boundary of the membrane. 9 Combining (38), (40) 
and (41), we find that the action functional for the vibrating membrane is 


J[u] = f 1 (T- U x - U 2 ) dt 
h 0 

= {p u ?( x > y> 0 - '[«?(*> y, t ) + ul(x, y, /)]} dx dy dt (42) 

-4 f 1 f *(*) u2 (s, 0 ds dt. 

I hn Jr 


Suppose we go from the function u(x , y, t) to the “varied” function 
u*(x, y, t) = u(x, y, t) + e^(x, y, t) + • • • 

Then, using formula (4) of Sec. 35 and dropping arguments of functions, we 
find that the variation 8 J of the functional (42) is 

8/ = S 1 J £ [- pu tt + T (u xx + w^)]^ dx dy dt 

- Z tjr^ dsdt - £T C/1 [I (U ^ + Ty (U M dxdydt 

+ zf t 1 Jf R j t (uS d xdydt. (43) 

Just as in the case of the vibrating string, we assume that the function 
w(x, y, t) is not varied at the initial and final times, i.e., that 

<K*> y, l o) = ''Ax, y, h) s o. (44) 

Because of (44), the last integral in (43) vanishes. Moreover, using Green’s 
theorem in two dimensions (see p. 23), we have 

/L [ax + v y dx dy = f r d y ~ dx ) 

- J r [I C ° S » ' + dsA " (j + *) - £ Si " » ' + C0S (1 + »)] 

-J r * + * 

where d/dn denotes differentiation with respect to n , the outward normal to 
T, and & is the angle between n and the x-axis. Thus, we can finally write 
(43) in the form 

8/ = s f 1 f I [-p u tt + t (u xx + w^)]^ dx dy dt 

ho J JR (45) 

“ e £jr( XU + r ^h dsdt - 

9 More precisely, let the parametric equations of T be 

x = *(•*)» y — X^X s o < s < $i. 

Then u(s , /) means w[X$X X J X t ], and “the point s” means the point (X-fX X J )X 
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We first assume that 


+(M) = 0 (seF), (46) 

where t is arbitrary, i.e., that u does not vary on the boundary of the mem¬ 
brane. Then (45) reduces to just 


8/ = 1 [- pu tt + t (u xx + u yy )] dx dy dt . (47) 

Setting (47) equal to zero, and using the arbitrariness of the interval [t 09 
and of the function ^ = ^(x, y , t ) inside R x [f 0 , t ± ], we find that 


Utt(x, y, t ) = a 2 [u IX (x, y, t) + u yy (x, y, 0] [a 2 = ^ (48) 

for (x, y) e R and all t , a result known as the equation of the vibrating mem¬ 
brane. 10 Equation (48) can also be written as 

Unix, y, t) = a 2 V 2 u(x, y, t ), 


in terms of the Laplacian {operator) 


a 2 


dx 2 


+ 


d 2 
dy 2 


(49) 


Next, we remove the restriction (46). Since u(x, y , t) must satisfy (48), 
the first term in (45) vanishes, and we are left with 


8J = 


: f 1 f [x(j)m(5, t) 
Jt 0 Jr L 


+ t 


du{s , t) 
dn 


t) ds dt. 


(50) 


Then, since 4»(-s, 0 is an arbitrary admissible function, equating (50) to zero 
leads to the formula 11 


x( 5 ) M (*, t) + T = 0 (J6 n. (51) 

This is the boundary condition satisfied by a vibrating membrane when its 
boundary is elastically fastened to its equilibrium position. In particular, 
if the boundary of the membrane is free, x(s) = 0 and (51) becomes 

8j %r = 0 (sen, (52) 

while if the boundary of the membrane is fixed, x(s) = oo and (51) becomes 

u(s , 0 = 0 (j g T). (53) 


10 By R x po, /i] is meant the Cartesian product of R and [ t 0 , 1 1 ], i.e., the set of all 
points (x, y , /) where (x, y) e R and t e p 0 , M- 

11 The boundary conditions (51), (52) and (53) hold for all /. 
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36.4. The vibrating plate. Finally, we use the principle of stationary 
action to derive the equation of motion and the boundary conditions for the 
transverse vibrations of a plate (i.e., a homogeneous two-dimensional elastic 
body) with surface mass density p. As in the case of the vibrating membrane, 
let u(x, y, t) denote the displacement from equilibrium of the point (x, y) of 
the plate, at time t. Then the kinetic energy of the plate at time t is given by 

T = ^ p |J b uf(x, y, t ) dx dy, (54) 

where R is the region of the x^-plane occupied by the plate at rest [cf. (38)]. 

The potential energy of deformation of the plate, which we denote by U l9 
depends on how the plate is bent, and hence involves the second derivatives 
u xx , u xy and u yy . Unlike the case of the membrane, it is assumed that no 
work is done in stretching the plate, so that U 1 does not involve u x and u y . 
Moreover, we require U 1 to be a quadratic functional in u xx , u xy and u yy , 12 
which does not depend on the orientation of the coordinate system. Then, 
since the matrix 


Uxx Wjyjl 
Uyx Uyy || 

has just two invariants under rotations, i.e., its trace and its determinant, 13 
it follows that 

Ui J* \A(u xx T- u yy ) T- B(u xx u yy u xy )\ dx dy ^ (55) 

where A and B are constants. Equation (55) is usually written in the form 

Ui = \ c JJ B [04 + Uyy) - 2(1 - y.)(u ix u yy - m| u )] dx dy , (56) 

where c is a constant depending on the choice of units, and [i is an absolute 
constant (Poisson's ratio ) characterizing the material from which the plate is 
made. For simplicity, we set c = 1. 

In addition to the potential energy of deformation U l9 the total potential 
energy of the plate may also contain a contribution U 2 due to bending 
moments with density m(s , /), prescribed on the boundary T of R , and a 
contribution U 3 due to external forces acting on R with surface density 
f(x , y, t) and on T with linear density p(s , t). This would give 

U 2 = J t m(s, t ) ds, (57) 


12 This guarantees that the equation of motion of the plate is linear. 

13 See e.g., G. E. Shilov, op. cit ., p. 106. 
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where d/dn denotes differentiation with respect to n , the outward normal 
to T, and 14 

U 3 = | f(x, y, t)u(x, y, t) dx dy + J p(s, t) ds. (58) 

Combining (54), (56), (57) and (58), we find that the action functional for 
the vibrating plate is 

J[u] = f' 1 (T — U x — U a — U 3 ) dt 

Jto 

= \ / 1 J [p m < 2 - ( M « + u vyf + 2(1 - [L){u xt u yy - u%) - 2/m] dxdydt 

~tl{ PU + m t) dsdL < 59) 

Unlike the corresponding expressions for the vibrating string and the 
vibrating membrane, (59) contains second derivatives of the unknown 
function u. The variation of (59) corresponding to the transition from 
u{x, y, t) to 

u*(x, y, t) = u(x, y, t) + s<Kx, y, 0 + 
turns out to be (see Problems 4 and 5, p. 190) 

8J = c f 1 f f ( pMjj - V*m -/>[» dx dy dt 

Jt 0 J Jr ( 60 ) 

+ E 11 [ (P ~ P} * + dL 

Here, 

M — — [(xV 2 m + (1 - \i){u XI xl + 2 u xy x n y n + u vy yl)\ (61) 

and 

3 3 

P = 8n V2 “ + ^ ^ 8s ^ UxlXnXs + + WnJ's], (62) 


where d/dw denotes differentiation in the direction of the outward normal 
to T, with direction cosines x n9 y n , and d/ds denotes differentiation in the 
direction of the tangent to T, with direction cosines x S9 y s . Moreover, 


V<M=V 2 (V 2 «) = g + 2^ 2 


d*u 

+ 


according to (49). 

We first assume that 


*M) = o, ^ = o 


(.s e T), 


(63) 


14 An identical term might also have been included in the expression for the potential 
energy of the vibrating membrane. 
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where t is arbitrary, i.e., that u and its normal derivative do not vary on the 
boundary of the plate. Then (60) reduces to just 

8 J = e f' 1 ff K (- P u tt - V 4 m - /)<{, dx dy dt. (64) 

Setting (64) equal to zero, and using the arbitrariness of the interval [f 0 , t ± ] 
and of the function = ^(x, y , t) inside R x [7 0 , ^], we obtain the equation 
for forced vibrations of the plate: 15 

puttix, y, o + v 4 u(x, y, t) + f{x, y, t) = 0. (65) 

If we set / = 0, so that there are no external forces acting on the plate, (65) 
reduces to the equation for free vibrations of the plate 

p U tt (x, y, t) + V 4 «(x, y, t ) = 0. 

Finally, if we set u tt = 0 in (65) and assume that / = f(x , y) is independent 
of time, we obtain an equation for the equilibrium position of the plate 
under the action of external forces: 

V 4 w(x, y) + f{x, j) = 0. 

This equation could have been obtained directly from the condition for the 
potential energy of the plate to have a minimum (see Remark 2 below). 

Next, we remove the restriction (63). Since u(x , y , t) must satisfy (65), 
the first term in (60) vanishes, and we are left with 

8J = e j^ 1 [(P - p)<\> + {M -m) ds dt. (66) 

Then, since the functions 4 1 , ty/dn and the interval [^ 0 , ^i] are arbitrary, 
equating (66) to zero leads to the natural boundary conditions 

P(s , t) — p(s , t) = 0, M(s , t) — m(s , 0 = 0 (s e T). (67) 

If the boundary of the plate is clamped , the conditions (67) are replaced by 
the “imposed” boundary conditions 

u{s, t) = 0, ^ = 0 (seT). 

If the plate is supported , i.e., if the boundary of the plate is held fixed while 
the tangent plane at the boundary can vary, we obtain the boundary con¬ 
ditions 

u(s , t) = 0, M(s , t) — m(s , 0 = 0 (s e T). 


15 When domains of arguments are not specified, it is understood that t is arbitrary 
and C x , y) e R. 
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Remark 1. It should be noted that the Euler equation (65) does not involve 
the coefficient j jl. This is explained by the fact that the expression 

UxxUyy - U*y (68) 

is the divergence of the vector 

( u x u yy , - u x u xy % 

and hence has no effect on (65). However, (68) does have a decisive effect 
on the boundary conditions, via the functions M(s , t) and P{s , t). 

Remark 2. For a mechanical system to be in equilibrium, its kinetic 
energy T must vanish and its potential energy U must be independent of 
time. Under these conditions, the principle of stationary action reduces to 
the assertion that &£/ = 0. Thus, the equilibrium position of the system 
corresponds to a stationary value of U. Moreover, it can be shown that this 
stationary value must be a minimum if the equilibrium is to be stable and 
hence physically realizable. In elasticity theory, this principle of minimum 
potential energy is often replaced by Castigliano's principle , which states 
that the equilibrium position of an elastic body corresponds to a minimum 
of the work of deformation. 16 


37. Variation of a Functional Defined on a Variable Region 

37.1. Statement of the problem. In Sec. 35, we derived a formula for the 
variation of the functional 

j[u] = | • • • J F(X U ...,x n ,u,u Xl ,..., u Xn ) dx! ■■■ dx n , (69) 

allowing only the function u (and hence its derivatives) to vary, while leaving 
the independent variables (and hence the region of integration R) unchanged. 
We now find the variation of the functional (69) in the general case where the 
independent variables x l9 ..., x n are varied, as well as the function u and its 
derivatives. For simplicity, we use vector notation, writing* = (* 1? ..., * n ), 
dx = dx 1 • • • dx n and 


grad u = Vw = (u Xl , . . ., u Xn ). 

With this notation, (69) becomes 

J[u]= f F(x, u, Vu) dx. (70) 

Jr 


16 For a detailed treatment of Castigliano’s principle and a proof of its equivalence 
to the principle of minimum potential energy, see e.g., R. Courant and D. Hilbert, 
Methods of Mathematical Physics , Vol. /, Interscience, Inc., New York (1953), pp. 268-272. 
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Now consider the family of transformations 17 
xf = U, Vw;s), 

w* = T(x, w, Vw; e), ' 

depending on a parameter e, where the functions (/ = 1,.. n) and T* 
are differentiable with respect to e, and the value e = 0 corresponds to the 
identity transformation: 


w, Vw; 0) = x i9 
Y(jc, w, Vw; 0) = w. 

The transformation (71) carries the surface a, with the equation 


(72) 


w = u(x) (x G 7?), 

into another surface a*. In fact, replacing w, Vw in (71) by u(x ), Vw(x), 
and eliminating x from the resulting n + 1 equations, we obtain the equation 

w* = w*(x*) (jt* g 7?*) 

for a*, where x* = (xf, ..., jc*), and 7?* is a new ^-dimensional region. 
Thus, the transformation (71) carries the functional ^[w(x)] into 


«/[w*(x*)] = f F(x*, w*, V*w*) dx*, 
Jr* 

where 

V*w* = (w**,..., w**). 


Our goal in this section is to calculate the variation of the functional (70) 
corresponding to the transformation from x f u(x) to x* 9 u*(x*), i.e., the 
principal linear part (relative to e) of the difference 

J[u*(x*)] - J[u(x)]. (73) 


37.2. Calculation of $x t and Sw. As in the proof of Noether’s theorem for 
one-dimensional regions (see p. 82), suppose e is a small quantity. Then, 
by Taylor’s theorem, we have 


JC,* 

w* 


= Oj(x, w, Vw; z) = ® t (;c, w, Vw; 0) + z 


= TX*, w, Vw; e) = x Y(x , w, Vw; 0) + e 


dQ> t (x, w, Vw; z) 
dz 

d x ¥(x , w, Vw; e) 
dz 


+ °( e )’ 

E = 0 

+ 0 ( £ )> 

£ = 0 


or using (72), 

x? = x t + s<pi(x, w, Vw) + o(z ), 
w* = w + et[Xx, «, Vw) + 0 (e), 


(74) 


17 These formulas, with n independent variables and 1 unknown function, should be 
contrasted with the formulas (45) of Sec. 20, with n unknown functions and 1 independent 
variable. 
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where 


9 i(x, u, Vu) = 

<K*. Vu ) = 


dO f (x, u, Vm; e) 

ds 

w, Vw; e) 


E = 0 


8 = 0 


(75) 


For a given surface a, with equation w = w(x), (74) leads to the increments 
j\x t = xf — Xi = £ 9 i(x) + o(s) (76) 


and 

Aw = w*(x*) — u(x) = e<f(x) + o(s), (77) 


where we explicitly indicate the arguments x and x* at which the functions 
w and w* are evaluated, and 9 i(x), tf(;c) denote the functions (75) with w, Vw 
replaced by u(x ), Vw(x). Formula (77) gives an expression for the change in 
w-coordinate as we go from the point (x, u(x)) on the surface a to its image 
(x*, u*(x*)) under the transformation (74). The variations 8x t and 8w 
corresponding to (74) are defined as the principal linear parts (relative to e) 
of the increments (76) and (77), i.e., 

8 x t = £ 9 t(x), 8 w = s4»(x). (78) 

We must also consider the increment 


Aw = u*(x) — u(x ), 

i.e., the change in w-coordinate as we go from the point ( x , u(x)) to the 
point (x 9 u*(x)) on the surface cr* with the same x-coordinate , where a* is the 
image of the surface a under the transformation (74). Imitating (77) and 
(78), we introduce a new function $(x) and a corresponding variation ^u: 

Aw = w*(x) — u(x) = s$(x) + o(s), 

8 w = e<p(;c). 

To find the relation between tf and if, or equivalently, between 8w and 8w, 
we write 

Aw = u*(x*) — u(x) = [w*(x*) — w*(x)] + [w*(x) — u(x)] 

= 2 T7 W - x,) + 8u + o(e) 

= i * (79) 

= £ gr + ^ + 0(e). 

i= 1 

Since du*ldXi and dw/ebq differ only by a quantity of order e, (79) becomes 

Aw ~ V 8xj + 8w, 
itl 

where the symbol ~ denotes equality except for terms of order higher than 1 
relative to e. But Aw ~ 8w, since 8w is the principal part of Aw, and hence 

_ n 

8w = 8w + 2 u x t • 

i=i 


( 80 ) 
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Moreover, since 

8 u = s 4 >, 8 u = sip, 8x t = scp*, 

(80) also implies 

n 

+ = $ + 2 u *&- ( gl ) 

f=i 


Example. Let u be a function of a single independent variable x, and let 
(71) be the transformation 


x* = x cos s — u(x ) sin e = x — eu(x) + 0 (s), 
u*(x*) = x sin s + w(x) cos £ = ex + w(x) + 0 (s), 


(82) 


i.e., a counterclockwise rotation of the xw-plane about the small angle a = e. 
As shown in Figure 10, (82) carries the point (. x,u(x )) on the curve y with 
equation u = u(x) into the point (x*, w*(x*)) 
on its image y* with equation u* = u*(x*). 

It follows from (82) that 

8x = — eu(x), 8 u = ex (83) 

and 

<p(x) = —u(x\ <p(x) = x. (84) 

In fact, the expressions (83) can be read 
directly off the figure, as the components of 
the vector joining the point (x, u(x)) to the 
point (x*, u*(x*)). Moreover, 

u*(x) = u*[x* + eu(x)] + o(e) = w*(x*) + eu(x)u*\x*) + o(e), 
and since u*'(x*) and u\x) differ only by a quantity of order e, we have 
u*(x) = u*(x*) + eu(x)u(x) + o(e). 

On the other hand, according to the second of the formulas (82), 
u*(x*) = ex + u(x) + o(e). 

It follows that 



A u = u*(x) — u(x) = e[x + u\x)u{x)\ + o(e) 

and 

8 u = e[x + w(x)«'(*)], 

<p(x) = x + u(x)u'(x). 

Using (83) and (84), we can write (85) as 

8 w = 8 w + u' 8x, 

= $ + u' 9 , 

in complete agreement with (80) and (81). 


( 85 ) 
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37.3. Calculation of 8 u Xl . We now derive an expression for the quantity 


. _ du*{x*) _ 8u(x) 

A x ‘ 8x * 8x t ’ 

or more precisely, its principal part which will be required later when 
we calculate the increment (73). First, we note that according to (74), 18 


8x% 

dx x 




+ 


8x’ x 


( 86 ) 


where 8 tk is the Kronecker delta, equal to 1 if / = k and 0 otherwise. It 
follows that 


JL = v JL M 

8x i 4i 8x$ 8x t 



) 


= _L + e y %-A, 

dxf + ,4! 0JC, 8xt 

'*5 

d _ ^ ^ V d 

dx* dxf 8 ^ dx x dxf 

Next we write 

. du*(x*) du(x) 

1114x1 ~ ~d^f 

_ d[u*(x*) — w(x*)] d[u(x*) — w(x)] / d 

dxf dx t \dxf 


(87) 



and analyze each of the three terms in the right-hand side separately. Using 
(87) and the fact that 

W*(jC*) _ ^ £(p(x*), 

we have 


d[u*(x*) — w(x*)] d[u*(x*) — w(x*)] 

dxf dx x 

Moreover, it is easily verified that 

8[u(x*) - w(x)] 8 v du(x) 

fZ ~ z “ITT - _ X * 


and 


d(j;(.x*) dtp(.x) 
8 dx x 8 dx t 


du(x) 


-1 
dx. 4 -* 


<?k(x) 




y 8u(x) 

S k 4i dx i dx * 


( 88 ) 


(89) 

(90) 


18 In expressions like dcp k /dx ly u is regarded as a function, i.e., the value of u is not held 
fixed, as might be inferred from the somewhat ambiguous notation for partial derivatives. 
Actually, d(p k /8x t means 
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Adding equations (88), (89) and (90), we obtain 

A du*(x*) du(x) (dl ^ d 2 u \ 
Uxi dxf dXi yehc* + i) x . $ Xk V k J 


(91) 


Finally, recalling that 
we can write (91) as 


8u Xl , 8u = $ x k = e?te, 


_ 71 

= (^w)lj “1" 2 ^ x i x k ^ x k' 


(92) 


37.4. Calculation of 87. We are now in a position to calculate the varia¬ 
tion of a functional defined on a variable domain. 


(93) 


(94) 


Theorem 1. The variation of the functional 

J[u\ = f F(x, u , Yw) dx 
Jr 

corresponding to the transformation 19 

xf = d>i(x, w, Yw; e) ~ x { + e<Pi(x, w, Yw), 
w* = T(x, w, Yw; e) ~ w + e^(x, w, Yw) 

(/ = 1,.. ., n) is given by the formula 

w = * K' 7 - -14, * + * I 4; + F *> *■ < 95 > 

where 

n 

(f = 4- - 2 «*,?«• 

i= 1 

Proof Here, 87 means the principal linear part (relative to e) of 
the increment 

A J = y[w*(jc*)] - J[u(x)l (96) 

where w*(x*) is the image of w(a) under the transformation (94). By 
definition, (96) equals 


where 


A./ = f F(x*, w*, Y*w*) dx * — f F(x , w, Yw) c/x 

Jr Jr 

= f m*, V*w*) g (*?>-•■>**) - F(x, u, Vm)1 dx, 

g(xf, ■..,**) 

d(x u ...,x n ) 


(97) 


19 As usual, the symbol ~ denotes equality except for terms of order higher than 1 
relative to e. 
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is the Jacobian of the transformation from the variables x l9 .. . 9 x n to 
the variables x $ 9 ..., x*. According to (86), this Jacobian is 


dcpi 

8<p 2 

- 8 <?« 



dx-L 

0X 1 

8 Xl 



&pi 
8x 2 

8x 2 

c 8 <?n 

8x 2 



8 <Pi 

8x n 

e>cp 2 

8 X n 

■■■ !+.§£= 

8x n 







- 1 + e i 

it] [ ^ 


and hence we can write (97) as 

A J ~ £ |f(x*, u*, + £ ^ - F(x, u, Vm)| dx. (98) 

Using Taylor’s theorem to expand the integrand of (98), and retaining 
only terms of order 1 relative to s, we find that 

87 = f \ 2 F Xi 8x, + F u 8u+ 2 F Utl 8 u x> + eF J dx - (") 

JR L i=l i=l i= 1 

Then, since 8x t = ecp if substitution of (80) and (92) into (99) gives 

87 = f I 2 S *‘ + F a ^ + F u 2 U* 8*. + 2 F ^ (*«)*. 000) 

jR L i=l i=l i= 1 

+ 2 F «„ u *i*k: 8 X k + f2 (8*0 J dx. 
i,k = 1 i=l J 

As in the case of a fixed domain R , we try to represent the integrand 
of ( 100 ) as an expression of the form 20 

G(x) 8 u + div (• • •) 

(cf. p. 153). This can be achieved by noting that 

2 (FSx,) = 2 F *< + 2 F ( 8x ^> 

i= 1 OX i i= 1 i= 1 

n n 

4" ^ F u U Xi $Xi + ^ Fu Xi U X iX k & x k 

i= 1 i, k=l 

and 

| = i ± (F*, 5) -1 F,„) s„. 

20 Then, because of the //-dimensional version of Green’s theorem [see formula (5)], 
the second term of (101) can be transformed into a surface integral. 
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(The last formula resembles an integration by parts.) Thus, finally, 
we have 

» S k F ") Tudx + L %k iF -*“ + F>xddx - (101 > 

which is the same as formula (95), since 8 u = s$, 8x k = scp k . This 
proves the theorem. 

Remark 1. In the special case where the function u and its derivatives 
are varied, but not the independent variables x it we have 


Tt 

<pi = o, ^ - 2 u *&* = 

t=i 

and (95) becomes 

w -' L ("■ - i k * + * L I k 


which is identical with formula (4) of Sec. 35. 


Remark 2. The formula for the variation of the functional /[w] is ordinarily 
used in the case where u = u(x) is an extremal surface of J[u], i.e., satisfies 
the Euler equation 


Then (95) reduces to 


i=l M 


= 0 . 


8j =^ R i^j + F ^ dx 


in the general case, and to 

- t L%k^ )dx 

in the case where the independent variables x t are not varied. 

Remark 3. Consider the functional 

7[m 1; ..., w m ] = £ f(x, (102) 

involving m unknown functions u 1 ,...,u m and their derivatives 

g (»= l,...,n;j= (103) 


Introducing the vector u = (u u ..u m ) and interpreting V« as the tensor 
with components (103), we can still write (102) in the form 
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Then, if (94) is replaced by the transformation 

xf = u, V«; e) ~ x, + e^,(x, u. Vm) (t = 1,.. n), , ft , 

uf = Yj(x, u. Vm; e) ~ u, + t'\>,(x, u,Vu ) (j = 1,..., m), K J 

the formula (95) generalizes to 


= s f 2 l F u, - 2 -4f\ dx 

1 (=i8x t jdu,y’\ 

\ \dXiJ / 

= ^ - 2 I 7 *P‘ O' = 1. m). 

i= 1 


where 


(105) 


Remark 4. Let (104) be replaced by the more general transformation 

r 

xf = <D f (x, u, Vm; e) ~ x t + 2 s k9i k) ( x , w . Vm) O' = !,•••,«), 

fc = 1 

uf = T/x, m, Vm; e) ~ u, + 2 «, Vm) (y = I,.... m), 

k= 1 

depending on r parameters si,.. ., e r , where e means the vector (e 1? ..., e r ) 
and the symbol ~ denotes equality except for quantities of order higher than 
1 relative to ..., e r . Then, formula (105) generalizes further to 


where 


sj =hLt(r-.-u,^fy> 
+ z^M(%^kf +F « y 


W = <W k) - 2 


dUj 

dxi 


n<*> 


(* 




37.5. Noether’s theorem. Using formula (95) for the variation of a 
functional, we can deduce an important theorem due to Noether, concerning 
“invariant variational problems.” This theorem has already been proved 
in Sec. 20 for the case of a single independent variable. Suppose we have a 
functional 


J[u] = f F(x , w, Vw) dx 

Jr 


( 106 ) 
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and a transformation 


x? = Oj(x, w, Vw), 
w* = T(x, w, Vw) 


(107) 


(/ = 1 , ..., n) carrying the surface a with equation w = u(x) into the surface 
a* with equation w* = w*(x*), in the way described on p. 169. 

Definition. 21 The functional (106) is said to be invariant under the 
transformation (107) if J[c r*] = J[c r], i.e., if 


f F(;c*, w*, V*w*) dx* = f F(x, w, Vw) dx. 
Jr Jr 

Example. The functional 

is invariant under the rotation 


x* = x cos £ — y sin £, 

y* = x sin z + y cos s, (108) 

w* = w, 

where £ is an arbitrary constant. In fact, since the inverse of the trans¬ 
formation (108) is 

x = x * cos £ + y* sin £, 
y = — x* sin £ + y* cos £, 
w = w*, 

it follows that, given a surface a with equation w = u(x, y ), the “ transformed ” 
surface a* has the equation 

w* = u(x* cos £ + y* sin £, — x* sin £ + y* cos £) = u*(x*, y*). 
Consequently, we have 


J[d*\ 


- //,• [( 

- JI- K 
-//.[(£) 


dw*y 
dx*) 
du 


(|!)’] d y. 

) + S ‘ n £ + ^ C ° S E ) ] ^ ^ 


du . \ 2 

„ cos £ — 77 - Sin £ + 

dx dy 

2 _|_ /^ w \ 2 l 


Theorem 2 (Noether). If the functional 

J[u ] = f F(x,u, Vu)dx (109) 

Jr 


21 Cf. the analogous definition on p. 80 and the subsequent examples. 
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is invariant under the family of transformations 

xf = W, Vw; s) ~ Xi + £9i(x, w, Vw), 

w* = T(x, w, Vw; e) ~ w + w, Vw) ' 

(/ = 1, ..., n) for an arbitrary region R , then 

+^?o=° on) 

ow ewc/i extremal surface of J[u ], w/iere 

n 

$ = + - 2 
i=i 

Proof According to formula (95), 


if w = w(x) is an extremal surface. Since J[u ] is invariant under (110), 
8 / = 0, and since R is arbitrary, this implies (111), as asserted. 

Remark L If we drop the requirement that w = w(x) be an extremal 
surface of/[w], then, using (95) again, we find that (111) is replaced by 


Remark 2. If there are m unknown functions w x , ...,w m , we introduce 
the vector w = (w x ,..., w m ) and continue to write (109), as in Remark 3, 
p. 175. Then invariance of/[w] under the family of transformations 

xf = ^(x, w, Vw; e) ~ Xi + e<pj(x, w, Vw) (/ = 1,..., n), 

uf = Yfx, w, Vw; e) - w ; + e<^(x, w, Vw) O’ = 1,..., m) 

implies that 


2^. 2 

i=i u ’ x i 



where 


1 = 0 , 


t | ^ dUj 

h = 9i * 

i= 1 


( 112 ) 


When n = 1, (112) reduces to 

7 / m \ 


or 


m / m \ 

2 + ( F ~ 2 M 5 /r «>) < p = const 

; = 1 \ ; = 1 / 


( 113 ) 
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along each extremal. This is precisely the version of Noether’s theorem 
proved in Sec. 20. In other words, the left-hand side of (113) is a first 
integral of the system of Euler equations 

Fui ~lx Fu ' l = 0 0'=l,...,m). 


Remark 3. Invariance of the functional (109) under the r-parameter family 
of transformations (see Remark 4, p. 176) 

r 

x? = m, Vm; e) ~ Xi + 2 M > v «) O' = 


U* = T/x, m, Vm; e) ~ + 2 e fc <J4 w (*> Vw) 


fc=l 


implies the existence of r linearly independent relations 


fix* I 13u \ 

VXi I y =1 gl UU )\ 

\ \dXt) 


0 = l,...,m) 

l,-..,r), (114) 


where 





Remark 4. Suppose the functional J[u ] is invariant under a family of 
transformations depending on r arbitrary functions instead of r arbitrary 
parameters. Then, according to another theorem of Noether (which will 
not be proved here), there are r identities connecting the left-hand sides of 
the Euler equations corresponding to J[u\. For example, consider the 
simplest variational problem in parametric form, involving a functional 


J[x,y\= f 1 y, x, y) dt. 


(115) 


where O is a positive-homogeneous function of degree 1 in x(t) and y(t) 
(see Sec. 10). Then, as already noted on p. 39, J[x, y] does not change if 
we introduce a new parameter t by setting t = 7(t), where dtldi: > 0, and 
in fact, the left-hand sides of the Euler equations 


O - — O* = 0 
x dt * U ’ 


O -—0=0 
y dt y u 


corresponding to (115) are connected by the identity 

Another interesting example of a family of transformations depending 
on an arbitrary function, i.e., the gauge transformations of electrodynamics, 
will be given in Sec. 39. 
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38. Applications to Field Theory 

38.1. The principle of stationary action for fields. In Sec. 36, we discussed 
the application of the principle of stationary action to vibrating systems with 
infinitely many degrees of freedom. These systems were characterized by 
a function u(x , t) or u(x , y, t) giving the transverse displacement of the system 
from its equilibrium position. More generally, consider a physical system 
(not necessarily mechanical) characterized by one function 

u(t, x l9 .. x n ) (116) 

or by a set of functions 

Wy(/, Xi, . . . , x n ) (J 1, . . . , /?l), 

depending on the time t and the space coordinates x l9 ..., x n . 22 Such a 
system is called a field [not to be confused with the concept of a field (of 
directions) treated in Chap. 6], and the functions w ; are called the field 
functions. As usual, we can simplify the notation by interpreting (116) as 
a vector function u = (u l9 ..., u m ) in the case where m > 1. It is also 
convenient to write 

t = x 09 x = (x 0 , *i,. . ., x n ) 9 dx = dx o dxi dx n . 

Then the field function (116) becomes simply u(x). 

In the case of the simple vibrating systems studied in Sec. 36, the equations 
of motion for the system were derived by first calculating the action functional 

f (T - U ) dt , 

Ja 

where T is the kinetic energy and U the potential energy of the system, and 
then invoking the principle of stationary action. Similarly, many other 
physical fields can be derived from a suitably defined action functional. 
By analogy with the vibrating string and the vibrating membrane, we write 
the action in the form 23 

J[u 9 Vw] = f dx o f • • • f L(u , Vw) dx 1 • • • dx n = f JS?(w, Yw) dx , (117) 

Ja J JR 


22 We deliberately write the argument t first, since it will soon be denoted by x 0 . 
In physical problems, n can only take the values 1, 2 or 3. However, the choice of m 
is not restricted, corresponding to the possibility of scalar fields, vector fields, tensor 
fields, etc. 

23 The aptness of this way of writing the action will be apparent from the examples. 
In the treatment of vibrating systems given in Sec. 36, we did not explicitly introduce 
the functions L = T — U and &. Of course, in some cases, e.g., the vibrating plate, 
^ must involve higher-order derivatives. 
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where V is the operator 


..., jLV 

\dxo dxi ’ dxj 


R is some ^-dimensional region, and Q. is the “cylindrical space-time region” 
R x [ta , b\, i.e., the Cartesian product of R and the interval [a, b] (see footnote 
10, p. 164). The functions L(u , Vw) and &{u, Vw) are called the Lagrangian 
and Lagrangian density of the field, respectively. Applying the principle of 
stationary action to (117), we require that 8J = 0. This leads to the Euler 
equations 


— - V d dSe 

8u i i=0 8x i d ^j 


O' = 1 


(118) 


which are the desired field equations. 

Example 1. For the vibrating string with free ends (x x = x 2 = 0), we 
have m = n = 1, and 

& = ±(P- "«?) = Kp«?o - ™?i) 

[cf. formula (16)]. 

Example 2. For the vibrating membrane with a free boundary [x(s) = 0] 
we have m = 1,^ = 2, and 

^ = ±[p«? - + «$] = i[p< - + II?,)] 

[cf. formula (42)]. 

Example 3. Consider the Klein-Gordon equation 

(□ - M 2 )u(x) = 0, (119) 


describing the scalar field corresponding to uncharged particles of mass M 
with spin zero (e.g., 7 r°-mesons). Here, □ denotes the D’AIembertian ( operator ) 

d 2 d 2 d 2 d 2 
^ dxl dx\ dxl dx% 


It is easy to see that (119) is the Euler equation corresponding to the Lagran¬ 
gian density 

& = i(w?o - - w ?2 - w ?3 - MW). (120) 


38.2. Conservation laws for fields. Noether’s theorem (derived in Sec. 
37.5) affords a general method of deriving conservation laws for fields, i.e., 
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for constructing combinations of field functions, called field invariants , 
which do not change in time. Thus, suppose the integral 


f j£?(w, Vw) dx 

is invariant under an r-parameter family of transformations 24 

r 

** = 3> f (x, u, Vu; e) ~ x, + 2 £ M k) 0 = 0, 1, 2, 3), 


( 121 ) 


uf 


T 

= ^j(x, u, Vu; e) ~ u, + 2 O' = 0 • • •> m), 


where s = (e l9 ..., s r ). Then, according to Remark 3, p. 179, we have r 
relations of the form 

div /<*> =|^ = 0, 
its 


where 


and 


m _ 


- KS) 


(A: =!,...,/•) 


( 122 ) 


These equations have the following interesting consequence: Suppose the 
cylinder Q. = R x [a, b], where R is the three-dimensional sphere defined by 


xl + x 2 2 + x$ ^ c 2 . 


Let T be the boundary of Q, and let v be the unit outward normal to T. 
Then, integrating each of the relations (122) over T and using Green’s 
theorem [formula (5) of Sec. 35], we obtain 


f div / (fc) dx = f (7 (te) , v) dc t = 0 (k = 1, ..., r). (123) 

Jci Jr 

The surface integral in (123) is the sum of an integral over the lateral surface 
of the cylinder T and an integral over the two end surfaces cut off by the 
planes x 0 = a, x 0 = b. As oo, the integral over the lateral surfaces 
goes to zero (by the usual argument requiring that the field fall off at infinity 
“sufficiently rapidly”), and we are left with the integral over the end surfaces. 


24 From now on, we set n = 3. 
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On these surfaces, the scalar product (/ (fc) , v) reduces to I ( Q k \ where the plus 
sign refers to the “top” surface and the minus sign to the “bottom” surface. 
Therefore, taking the limit as oo in (123), we find that 


J I ( Q k> (a, x u x 2 , x 3 ) dx x 


dxn dx 3 


= J Ik k) (b, x l9 x 2 , x 3 ) dx 1 dx 2 dx 3 (k = 1 ,..., r), 

where I^ k) denotes the x 0 -component of the vector I ik \ and the integrations 
extend over all of three-dimensional space, as will always be assumed if no 
region of integration is indicated. Since a and b are arbitrary, it follows 
from (124) that the quantities 


J Ib k) dx i 


dx o dx 3 




+ J 2 ? 9 ( q c) \ dx x dx 2 dx 3 (k = 1,..., r) (125) 


are independent of time. The r quantities (125) are the required field invari¬ 
ants, whose existence is implied by the invariance of the action functional 
under the r-parameter family of transformations (121). 

Remark. Of course, all the functions in (125) are supposed to be evaluated 
on an extremal surface of the action functional, corresponding to a solution 
u(x) of the field equations (118). 

38.3. Conservation of energy and momentum. The action functional of 
any physical field is invariant under parallel displacements, i.e., under the 
family of transformations 


X* = X t + Zi 

if = Uj 


(/ = 0, 1, 2, 3), 

O' = 1,..., m), 


where the e f are arbitrary. In this case, we have 


which implies 


pS te) = Sffc 


8x t = 8 Uj = 0 , 


x<« = _ V d Jtl * = _ ?0L, 
iU 8x t bik 8x„ 


where 8 ik is the Kronecker delta. According to (125), the corresponding 
field invariants are 

* / m ft.. \ 

(^ ■ ^ s ") (k - °- 12 - 3) - 
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It is convenient to introduce the second-rank tensor 


T ik =f 


88u f 
8x k 


- 2K 


(127) 


called the energy-momentum tensor. In terms of T ik , the field invariants are 


P k = J T 0k dx 1 dx 2 dx 3 {k = 0, 1, 2, 3). 

The vector 

p = (p 0 ,p u p 2 ,p 3 ) 

is called the energy-momentum vector , and in fact, it can be shown that P 0 is 
the energy and P u P 2 , P 3 the momentum components of the field. Thus, 
since P is a field invariant, we have just proved that the energy and momen¬ 
tum of the field are conserved. 

38.4. Conservation of angular momentum. According to the special 
theory of relativity, the action functional of any physical field is invariant 
under orthochronous Lorentz transformations , i.e., under transformations of 
four-dimensional space-time which leave the quadratic form 

-*o + x\ + x\ + x% 


invariant and preserve the time direction. 25 For simplicity, we consider 
the case where u(x) is a scalar field (m = 1). Then the action functional 
must be invariant under the family of (infinitesimal) transformations 


where 

and 


X* ~ x t + 2 gllZilXl, 
w* = w, 

goo = “I? <gn = S 22 = £33 = 1 

Zki = -e Ifc (k ^ l) 


(128) 


(129) 


are the parameters determining the given transformation. 26 Since the 
twelve parameters (k /) are connected by the relations (129), only six 
of them are independent, and we choose the independent parameters to be 
those for which k < 1. 


25 The determinant of the matrix corresponding to a Lorentz transformation equals 
±1, where the plus sign corresponds to the so-called proper Lorentz transformations. 
See e.g., V. I. Smirnov, Linear Algebra and Group Theory , translated by R. A. Silverman, 
McGraw-Hill Book Co., Inc., New York (1961), Chap. 7. 

26 The parameters e 12 , £ 13 , £23 are angles of rotation, while e 0 i, £ 02 , £03 are certain 
expressions involving the velocity of light and the velocity of one physical reference 
frame with respect to the other. 
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Corresponding to the transformations (128), we have 

3 

= 2 8u z il x l = 2 2 ^ik X l 

l±i l±kk = 0 

3 3 

= 22 Sifiw ^ ikX i + 22 £u Zk i ^ ifc x i 

l<k k = 0 k>l k= 0 

3 

== 2 2 Zkl (8u ^ Hc X l ~~ Skk $il x k)9 


l<k k = 0 

where 8 (fc is the Kronecker delta, and 


8 m = — T 8 x ( . 
its 


It follows that 


it 0 = gu $ikX, - g kk Kx k , 

~Y k - ° = 2 ^ fefc 8 “ x k - Su SifcJCi) = g’fcfcXfc - 

where the pair of indices k , / plays the same role as the single index A: in (121) 
and ranges over the six combinations 

0,1; 0,2; 0,3; 1,2; 1,3; 2,3. 

According to (125), the corresponding field invariants are 
dS£ \du du I 

8kkXk ~ 8F k gaXl \ 


\dx,J 


(130) 


+ &[gn $ik x i ~ gkk K x k] dx i dx 2 dx 3 (k < /). 


It is convenient to introduce the third-rank tensor 

d££ [ du du 1 , ~, r * * i n i\ 

Mik = T&A [dx, gkkXk ~ dx~ k 8llXl \ + ^ gu * ikXl “ gkk * llXk ^ ^ ^ 

\ te] 

Mm = ~ M ilk (k > /), (131) 

called the angular momentum tensor. By definition, M ikl is antisymmetric 
in the indices k and /. Using the expression (127) for the energy-momentum 
tensor (specialized to the case of scalar fields), we can write (131) as 

M iki = gkk X k^il ~ gll X lTik- 

In terms of M ikh the field invariants are 

J M 0kl dx x dx 2 dx 3 (k < /), 

a fact summarized by saying that the angular momentum of the field is 
conserved. 
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Example. Using the quantities g lh we can write the Lagrangian density 
(120) corresponding to the Klein-Gordon equation in the form 

3 




This leads to the energy-momentum tensor 

_ du du 
llk ~ ~ Sii to, d7 k ~ * bik 

and the angular momentum tensor 

, ^ du l du du \ x 

= gu ^ gkk x k + <^(gll x l °ik ~ gkk x k °il)- 

The energy density corresponding to (132) is 

while the momentum density has the components 

du du . 

Tok dx 0 dx k (k ,2,3) ‘ 


(132) 


38.5. The electromagnetic field. To illustrate the methods developed 
above, we now derive the equations of the electromagnetic field from a 
suitable Lagrangian density. The electromagnetic field is described by two 
three-dimensional vectors, the electric field vector E = (E l9 E 2 , E 3 ) and the 
magnetic field vector H = (H l9 H 2 , // 3 ). In the absence of electric charges, 
E and H are related by the familiar Maxwell equations 


where 


curl E = — 
div H = 0, 


dJL 

dx: o’ 


i u dE 

curl H = -—> 
dx o 

div E = 0, 


„ dE 1 dE 2 dE 3 

div £ = — 1 + — 2 + — 3 , 

dx\ dx 2 dx 3 


1 F = ( dEs _ ^2 dEi _ ^2 _ dEA 

cun z - y dx * dx ^ ^ dx ^ ^ dx jy 


(133) 


and similarly for div //, curl H. It is convenient to express E and H in 
terms of a four-dimensional electromagnetic potential {A,} = (A 0 , A l9 A 2 , A 3 ), 27 
by setting 

d A 

E = grad A 0 — -r— > H = curl A, (134) 

& x o 


27 Since the symbol A is reserved for the three-dimensional vector (A l9 A 2i A 3 ), we 
denote the four-dimensional vector (A 0 , A lt A 2 , A 3 ) by {A f }* A is sometimes called the 
vector potential and A 0 the scalar potential. 
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where 

and 


A — (A l9 A 2 , A 3 ) 


grad A 0 = 


( dA 0 dA 0 dA 0 \ 
\dx x ’ dx 2 9 dx 3 ) 


The potential {Aj} is not uniquely determined by the vectors E and H. 
In fact, E and H do not change if we make a gauge transformation , i.e., if 
we replace {A,) by a new potential {A]} with components 


A',(x) = A fc) + 


cf(x) 

dXj 


(j = 0, 1, 2, 3), 


where x = (x 0 , x l9 x 2 , x 3 ) and f(x) is an arbitrary function. To avoid this 
lack of uniqueness, an extra condition can be imposed on {Aj}. The 
condition usually chosen is 



dAj _ 
dXj 


0 , 


(135) 


and is known as the Lorentz condition. 

Next, we prove that the Maxwell equations (133) reduce to a single equa¬ 
tion determining the electromagnetic potential {Aj). First, we introduce the 
antisymmetric tensor H ij9 whose matrix 


0 

-E x 

-e 2 

-E 3 

E, 

0 

h 2 

-h 2 

e 2 

-Hz 

0 

Hi 

e 3 

h 2 

-Hi 

0 


is formed from the components of E and H. It is easily verified that the 
formula relating H u to the potential {A } } is 


In terms of the tensor H ij9 
form 




(136) 


8Aj _ dAi _ 

8x t dXj 

we can write the Maxwell equations (133) in the 


V dH t n 

^ gil aF 0 

( = 0 


O' = 0, 1, 2, 3), 


(137) 


dH ti dH ki 8H ik 

8x k 8Xj c)x, ‘ 

where in (138), 

f 0 , 1 , 2 , 

1,2, 3, 

2, 3, 0, 

3, 0, 1. 


( 138 ) 


i,j, k = 
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Substituting (136) into (137) and (138), and using the Lorentz condition (135), 
we find that (138) is an identity, while (137) reduces to 

□^, = 0 0 = 0, 1,2,3), (139) 

where □ is the D’Alembertian 

_ d 2 d 2 d 2 d 2 
^ dxl + dx\ + dx 2 + dx 2 

Finally, we show that (139) is a consequence of the principle of stationary 
action, 28 if we choose the Lagrangian density of the electromagnetic field 
to be 

<e = L (E 2 - H 2 ). (140) 


Replacing E and H in (140) by their expressions (134) in terms of the electro¬ 
magnetic potential { Aj }, we obtain 


* = k [( grad Ao ~ wS ~ (curl A)2 \ • (141) 


We shall only verify that the Euler equations 

8 ^ as? 

8Aj 


* <%) 


= 0 0 = 0, 1,2,3) 


(142) 


corresponding to (141) can be reduced to the form (139) for the component 
A o, since the calculations for A 1 ,A 2 ,A 3 are completely analogous. It 
follows from (141) that 


8 <e 


= 0 , 

C^l 

o 1 

~ e i d Ji A ' 


Wo/ 


dse 

1 

(8A 0 

8 A 1 

el SA °\ 

471 

\8x 1 

X 

0 





aS? 

1 

(8A 0 

_ dAi 

a (Mo\ 

471 

\dx 2 

8x 0 . 

\dx 2 ) 




dse 

1 

/ 8A 0 

dA 3 ' 

d (?M 

471 

Wa 

8A 0 , 

l^ 3 / 





28 Provided A satisfies the Lorentz condition. 
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Thus, for j = 0, (142) becomes 

dj£_ _ A dse 

8A o tk 

= _ _L \ d lAl + PAo , _ JL[ d Al 0A, a^\i 

471 l dx\ dx 2 dx% dxoXdxx dx 2 bx 3 )\ 

According to the Lorentz condition (135), 

&Ai & A 2 &Aq dA o 

dxi dx 2 dx 3 dx 0 ’ 

and hence (143) reduces to 


= 0 . 
(143) 


d 2 A o d 2 A 0 3 2 A o 3*A o 
dxo dx\ dx\ dx 3 


[ Z\Aq — 0 , 


which is just (139), for j = 0. 

Remark 1. In deriving (139) from (141), we made use of the Lorentz 
condition (135). Instead, we could have introduced an additional term 
into the Lagrangian density by writing 

z - s; {( grad A ’ ~ I- (curl A) ' - ( div A ~ S;)}’ <144) 

which reduces to (141) if the Lorentz condition is satisfied. The Euler 
equations corresponding to (144) reduce to (139) for arbitrary {Aj}. 

Remark 2. The Lagrangian density of the electromagnetic field, and hence 
its action functional, is invariant under parallel displacements, Lorentz 
transformations and gauge transformations. According to Sec. 38.3, the 
invariance under parallel displacements implies conservation of energy and 
momentum of the field, while, according to Sec. 38.4, the invariance under 
Lorentz transformations implies conservation of angular momentum of the 
field. Moreover, according to Remark 4, p. 179, the invariance under gauge 
transformations (which depend on one arbitrary function) implies the exis¬ 
tence of a relation between the left-hand sides of the corresponding Euler 
equations (139). Therefore, these equations do not uniquely determine 
the electromagnetic potential {A f }. In fact, to determine {Aj} uniquely, 
we need an extra equation, which is usually chosen to be the Lorentz condition 
(135). 29 


29 The Maxwell equations are actually invariant under a 15-parameter family (group) 
of transformations. In addition to the 10 conservation laws already mentioned (energy, 
momentum and angular momentum), this invariance leads to 5 more conservation laws, 
which, however, do not have direct physical meaning. For a detailed treatment of this 
problem, see E. Bessel-Hagen, Vber die Erhaltungssatze der Elektrodynamik , Math. Ann., 
84, 258 (1921). 
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PROBLEMS 


1. Find the Euler equation of the functional 


J[u] = f... f 2 «?< dxt... dx n . 

J J R j — i 

2 . Find the Euler equation of the functional 

J\m\ = + u x + Uy + Uz dx dy dz . 

3. Write the appropriate generalization of the Euler equation for the 
functional 

j[u] = JJ B F(x, y, u, u x , Uy, u xx , u xy , Uyy) dx dy. 

4. Starting from Green’s theorem 

JI (S - %) dx dy = l r (Pdx+Q dy), 

prove that 


+ £(*1 - ♦SK 

- 11 +" l( f T, ~ 

5. Let J[u] be the functional 

\f t 1 jJ R [-("« + u yy) 2 + 2(1 - ix)(u xx u yy - u? y )]dxdydt. 

Using the result of the preceding problem, prove that if we go from u to u + 
then 


U = e 1 JJ (~ dx dy dt + e 1 J |V(w)^ + A/(w)^j ds dt, 

where M{u) and P{u) are given by formulas (61) and (62). 

Hint. Express dty/dx, dty/dy in terms of dty/dn, dty/ds, and use integration 
by parts to get rid of dty/ds. 

6. Show that when n = 1, formula (105) of Sec. 37.4 reduces to formula (7) 
of Sec. 13. 


7. Given the functional 


compute /[a*] if a* is obtained from a by the transformation (108). 



PROBLEMS 


VARIATIONAL PROBLEMS INVOLVING MULTIPLE INTEGRALS 191 


8 . Derive the Euler equations corresponding to the Lagrangian density 


i = 0 1 i = 0 j = 0 \V*'j/ i = 0 


where the field variables are u , A 0 , A lt A 2i A 3 , and the factor e t equals 1 
if / = 0 and — 1 if i = 1, 2, 3. 

9. Show that the Lagrangian density ££ of the preceding problem is Lorentz- 
invariant if u transforms like a scalar and if A 0 , A lf A 2 , A 3 transform like the 
components of a vector under Lorentz transformations. Use this fact to 
derive various conservation laws for the field described by ££. 



8 


DIRECT METHODS 
IN THE 

CALCULUS OF VARIATIONS 


So far, the basic approach used to solve a given variational problem 
(and indeed, to prove the existence of a solution) has been to reduce the prob¬ 
lem to one involving a differential equation (or perhaps a system of differen¬ 
tial equations). However, this approach is not always effective, and is 
greatly complicated by the fact that what is needed to solve a given varia¬ 
tional problem is not a solution of the corresponding differential equation 
in a small neighborhood of some point (as is usually the case in the theory of 
differential equations), but rather a solution in some fixed region R, which 
satisfies prescribed boundary conditions on the boundary of R. The 
difficulties inherent in this approach (especially when several independent 
variables are involved, so that the differential equation is a partial differential 
equation) have led to a search for variational methods of a different kind, 
known as direct methods , which do not entail the reduction of variational 
problems to problems involving differential equations. 

Once they have been developed, direcLvariational methods can be used to 
solve differential equations, and this technique, the inverse of the one we 
have used until now, plays an important role in the modern theory of the 
subject. The basic idea is the following: Suppose it can be shown that a 
given differential equation is the Euler equation of some functional, and 
suppose it has been proved somehow that this functional has an extremum 
for a sufficiently smooth admissible function. Then, this very fact proves 
that the differential equation has a solution satisfying the boundary con¬ 
ditions corresponding to the given variational problem. Moreover, as we 

192 
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shall show below (Sec. 41), variational methods can be used not only to 
prove the existence of a solution of the original differential equation, but also 
to calculate a solution to any desired accuracy. 


39. Minimizing Sequences 

There are many different techniques lumped together under the heading 
of “direct methods.” However, the direct methods considered here are all 
based on the same general idea, which goes as follows: 

Consider the problem of finding the minimum of a functional J\y ] defined 
on a space Jt of admissible functions y. For the problem to make sense, 
it must be assumed that there are functions in Jt for which J[y ] < +oo, 
and moreover that 1 


inf./[>>] = [l > — oo, (1) 

y 

where the greatest lower bound is taken over all admissible y. Then, by 
the definition of fi, there exists an infinite sequence of functions {>> n } = 
y l9 y 2 , . • •, called a minimizing sequence , such that 

lim J[y n ] = [i. 

n-» oo 

If the sequence {>> n } has a limit function y, and if it is legitimate to write 

J[y] = lim J[y n ], (2) 

n-» oo 

i.e., 

J[ lim j n ] = lim J[y n ], 

n-+ oo n-» oo 

then 

J[y] = (x, 

and y is the solution of the variational problem. Moreover, the functions 
of the minimizing sequence {>>„} can be regarded as approximate solutions 
of our problem. 

Thus, to solve a given variational problem by the direct method, we must 

1. Construct a minimizing sequence { y n }; 

2. Prove that {j> n } has a limit function y; 

3. Prove the legitimacy of taking the limit (2). 

Remark 1. Two direct methods, the Ritz method and the method of finite 
differences , each involving the construction of a minimizing sequence, will 
be discussed in the next section. We reiterate that a minimizing sequence 
can always be constructed if (1) holds. 


1 By inf is meant the greatest lower bound or infimum. 
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Remark 2. Even if a minimizing sequence {y n } exists for a given varia¬ 
tional problem, it may not have a limit function y. For example, consider 
the functional 

./b] = j_ 1 x2 y’ 2 dx, 

where 

j(-l)=-l, j(l) = l. (3) 

Obviously, J[y] takes only positive values and 

inf J[y ] = 0. 

y 

We can choose 

as the minimizing sequence, since 

f 1 n 2 x 2 dx 1 r 1 dx _ 2 

J-i (tan _1 w) 2 (l + n 2 x 2 ) 2 < (tan _1 w) 2 J-i 1 + n 2 x 2 n ten' 1 n 

and hence /[j> n ] -> 0 as n -> oo. But as n -> oo, the sequence (4) has no limit 
in the class of continuous functions satisfying the boundary conditions (3). 

Even if the minimizing sequence {y n } has a limit y in the sense of the 
^-norm (i.e., y n ~^y as n -> oo, without any assumptions about the convergence 
of the derivatives of y n ), it is still no trivial matter to justify taking the limit 
(2), since in general, the functionals considered in the calculus of variations 
are not continuous in the ^-norm. However, (2) still holds if continuity 
of J[y ] is replaced by a weaker condition: 

Theorem. If{y n } w a minimizing sequence of the functional J[y ], with 
limit function y , and if J[y ] is lower semi continuous at y, 2 then 

J[y ] = lim -7bn]- 

n-» oo 

Proof On the one hand, 

J[y] > lim J[yA = inf-Zb], (5) 

n~* oo 

while, on the other hand, given any s > 0, 

-fbn] - J[y] > -o, (6) 

if n is sufficiently large. Letting n oo in (6), we obtain 

J[y] < I™ J[y n \ + e, 

n-» oo 

or 

J[y] < lim y[ Jn ], (7) 

n-» oo 


See Remark 1, p. 7. 
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since s is arbitrary. Comparing (5) and (7), we find that 

J[y] = lim 7[j n ], 

n-» oo 

as asserted. 


40. The Ritz Method and the Method of Finite Differences 3 

40.1. First, we describe the Ritz method , one of the most widely used direct 
variational methods. Suppose we are looking for the minimum of a func¬ 
tional J[y ] defined on some space Jt of admissible functions, which for 
simplicity we take to be a normed linear space. Let 

9i> ? 2 ,... (8) 

be an infinite sequence of functions in Jt , and let Jt n be the ^-dimensional 
linear subspace of Jt spanned by the first n of the functions (8), i.e., the set 
of all linear combinations of the form 

«1?1 + ‘ + a n?n, (9) 

where a l5 ..., a n are arbitrary real numbers. Then, on each subspace Jt n , 
the functional J[y ] leads to a function 

/[ai 9 i + • • • + a n cp n ] (10) 

of the n variables a x ,..., a n . 

Next, we choose a x ,..a n in such a way as to minimize (10), denoting 
the minimum by fi n and the element of Jt n which yields the minimum by y n . 
(In principle, this is a much simpler problem than finding the minimum of the 
functional J[y ] itself.) Clearly, fi n cannot increase with n , i.e., 

Hi ^ fio > 

since any linear combination of <p l9 ..., <p n is automatically a linear combi¬ 
nation <pi,..., <p n , 9 n + i. Correspondingly, each subspace of the sequence 

JtJt 2 , . . . 

is contained in the next. We now give conditions which guarantee that the 
sequence {>> n } is a minimizing sequence. 

Definition. The sequence (8) is said to be complete (in Jt) if given 
any yeJt and any z > 0, there is a linear combination Y) n of the form (9) 
such that ||Y) n — y\\ < z (where n depends on z). 


3 Here we merely outline these two methods, without worrying about questions of 
convergence, and taking for granted the existence of an exact solution of the given 
variational problem. 
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Theorem. If the functional J[y ] is continuous , 4 and if the sequence (8) 
is complete , then 


lim [i n [x, 

n-» oo 


where 


(X = inf7[j], 

y 


Proof Given any e > 0, let y* be such that 


/[. y*] <fi + £. 


(Such a y* exists for any s > 0, by the definition of (i.) Since J[y] is 
continuous, 

\J[y] - J&*]\ < e, (ii) 

provided that || y — y* || < 8 = 8(e). Let yj n be a linear combination of 
the form (9) such that ||y) n — y*\\ < 8. (Such an r\ n exists for suffi¬ 
ciently large n , since {<p n } is complete.) Moreover, let y n be the linear 
combination of the form (9) for which (10) achieves its minimum. 
Then, using (11), we find that 


[J. < J[y n ] < < [i. + 2e. 

Since s is arbitrary, it follows that 


lim J[y n ] = lim f i n = fi, 

n-» oo n-» oo 

as asserted. 


Remark 1. The geometric idea of the proof is the following: If {<p n } is 
complete, then any element in the infinite-dimensional space JI can be 
approximated arbitrarily closely by an element in the finite-dimensional 
space Jt n (for large enough n). We can summarize this fact by writing 

lim Jt n — Jt. 

n-» oo 

Let y be the element in JI for which J[y] = pi, and let y n e Jt n be a sequence 
of functions converging to y. Then {y n } is a minimizing sequence, since 
J[y] is continuous. Although this minimizing sequence cannot be con¬ 
structed without prior knowledge of y, we can show that our explicitly 
constructed sequence {y n } takes values J[y n ] arbitrarily close to J[y n ], and 
hence is itself a minimizing sequence. 

Remark 2. The speed of convergence of the Ritz method for a given 
variational problem obviously depends both on the problem itself and on 


4 1.e., continuous in the norm of Jt. For example, functionals of the form 

J[y] = F(x 9 y,y')dx 

Ja 

are continuous in the norm of the space (a, b). 
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the choice of the functions <p n . However, it should be pointed out that 
in many cases, linear combinations involving only a very small number of 
functions <p n are enough to give a quite satisfactory approximation to the 
exact solution. 

Remark 3. More generally, the spaces Jt and Jt n need not be normed 
linear spaces themselves, but only suitable sets of admissible functions 
belonging to an underlying normed linear space 8% (see Remark 3, p. 8). 
For example, the admissible functions may satisfy boundary conditions like 

y(a) = a, y{b) = B 
(see Sec. 40.2), or a subsidiary condition like 

f y\x) dx = 1 

Ja 

(see Sec. 41). This case can be handled by appropriate modifications of 
the present method. 

40.2. We now describe another method involving a sequence of finite¬ 
dimensional approximations to the space Jt . This is the method of finite 
differences , which has already been encountered in Sec. 7. There, in con¬ 
nection with the derivation of Euler’s equation, we noted that the problem 
of finding an extremum of the functional 5 

J[y] = f F(x,y,y')dx, y{a) = A, y(b) = B, (12) 

can be approximated by the problem of finding an extremum of a function 
of n variables, obtained as follows: We divide the interval [a, b ] into n + 1 
equal subintervals by introducing the points 

x 0 = a, x u ...,x n> x n + 1 = b, x t + 1 -x t =&x, 
and we replace the function y(x) by the polygonal line with vertices 


(* 0 , Jo), (* 1 , Jl), • • - , (*n, J„), (*„ + l, Jn + l), 
where now y t = >'( a ( ). Then (12) can be approximated by the sum 

j(yu ■ .jj = 2 F [*<’ yi+ \ x y ] ( 13 ) 

which is a function of n variables. (Recall that y 0 = A and y n + 1 = B are 
fixed.) If for each n , we find the polygonal line minimizing (13), we obtain 
a sequence of approximate solutions to the original variational problem. 


Here, Jl will be a linear space only if A — B = 0 (cf. Remark 3). 
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41. The Sturm-Liouville Problem 

In this section, we illustrate the application of direct variational methods 
to differential equations (cf. the remarks on p. 192), by studying the follow¬ 
ing boundary value problem, known as the Sturm-Liouville problem : Let 
P = i°(x) > 0 and Q = Q(x) be two given functions, where Q is continuous 
and P is continuously differentiable, and consider the differential equation 

-(Py'Y + Qy = *y 04 ) 

(known as the Sturm-Liouville equation ), subject to the boundary conditions 

y(a) = 0, y(b) = 0. (15) 

It is required to find the eigenfunctions and eigenvalues of the given boundary 
value problem, i.e., the nontrivial solutions 6 of (14), (15) and the correspond¬ 
ing values of the parameter X. 

Theorem. The Sturm-Liouville problem (14), (15) has an infinite 
sequence of eigenvalues X (1) , X (2) ,..., and to each eigenvalue X (n) there 
corresponds an eigenfunction y in) which is unique to within a constant 
factor . 

The proof of this theorem will be carried out in stages, and at the same 
time we shall derive a method for approximating the eigenvalues X (n) and 
eigenfunctions y {n \ 

41.1. We begin by observing that (14) is the Euler equation corresponding 
to the problem of finding an extremum of the quadratic functional 

J[y]= \\py 2 + Qy 2 )dx, (16) 

subject to the boundary conditions (15) and the subsidiary condition 7 

[ b y 2 dx=\. (17) 

Ja 

Thus, if >>(*) is a solution of this variational problem, it is also a solution 
of the differential equation (14), satisfying the boundary conditions (15). 
Moreover, j(x) is not identically zero, because of the condition (17). 

Next, we apply the Ritz method (see Sec. 40.1) to the functional (16), first 


6 In other words, the solutions which are not identically zero. For any value of X, 
(14) and (15) are trivially satisfied by the function y(jt) = 0. 

7 Use the theorem on p. 43, changing X to —X. 
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verifying that it is bounded from below, as required [cf. formula (1)]. Since 
P(x) > 0, this fact follows from the inequality 

f ( Py ' 2 + Qy 2 ) dx > f Qy 2 dx ^ M f y 2 dx = M, 

Ja Ja Ja 

where 

M = min Q(x). 

a ^ i ^ b 

For simplicity, we assume that a = 0, b = n, and we choose {sin nx} as the 
complete sequence of functions {<p n (*)} used i n the Ritz method. This 
sequence also has the desirable feature of being orthogonal , i.e., 

rn 

sin kx sin lx dx = 0 (k ^ /). 

Jo 

If a linear combination 

n 

2 sin A:x (18) 

fc=i 

is to be admissible, it must satisfy the conditions (15) and (17). The condition 
(15) is automatically satisfied by our choice of the functions sin nx , but (17) 
leads to the requirement 

J o ( j> sin kxj dx = ^ 2 a * = l - ( 19 ) 

Moreover, for a linear combination (18), the functional J[y] reduces to 

Jn («!,..., a n ) = ^ P(x )^ 2 a * sin + £?(*)( 2 a * sin ] dx > 

(20) 

which is a function of the « variables a x ,..., a n (in fact, a quadratic form 
in these variables. 

Thus, in terms of the variables a l5 ..., a n , our problem is to minimize 
J n ( a l5 ..., a n ) on the surface cr n of the ^-dimensional sphere with equation (19). 
Since a n is a compact set and J n { a 1? ..., a n ) is continuous on cr n , / n (a x ,..., a n ) 
has a minimum X ( n 1} at some point a^,..., a{ l 1) of cr n . 8 Let 

= 2 a(fcl) s * n 

k= 1 

be the linear combination (18) achieving the minimum Xj l 1) . If this procedure 
is carried out for n = 1, 2,..., we obtain a sequence of numbers 

X^X^,..., (21) 

and a corresponding sequence of functions 

yWxlyVXx),... (22) 

8 See e.g., T. M. Apostol, op. cit ., Theorem 4-20, p. 73. 
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Noting that cr n is the subset of a n + x obtained by setting on n + 1 = 0, while 

a n ) = J n + 1 (cL l9 .. a n , 0), 

we see that 

^ W, (23) 

since increasing the domain of definition of a function can only decrease its 
minimum. It follows from (23) and the fact that J[y] is bounded from below 
that the limit 

X (1) = lim X^ (24) 

n-» oo 

exists. 


41.2. Now that we have proved the convergence of the sequence of 
numbers (21), representing the minima of the functional 

f (Py 12 + Qy 2 ) dx 

Jo 

on the sets of functions of the form 


2 a k sin kx 

k= 1 

satisfying the condition (19), it is natural to try to prove the convergence 
of the sequence of functions (22) for which these minima are achieved. We 
first prove a weaker result: 

Lemma 1. The sequence {j4 r \x)} contains a uniformly convergent 
subsequence. 

Proof. For simplicity, we temporarily write instead of /^(x). 
The sequence 

W = \\py'n 2 + Qyl)dx 
Jo 


is convergent and hence bounded, i.e., 

f C Pyn + Qy 2 n) dx ^ M 
Jo 

for all n , where M is some constant. Therefore 

f Py n 2 dx ^ M + I f Qyi dx ^ M + max \Q(x)\ = M u 

Jo | Jo a ^ x ^ b 

and since P(x) > 0, 


fy'n 2 (x) 

Jo 


dx ^ 


Ml 


min P(x) 

a^.x^.b 


= m 2 . 


( 25 ) 


Using (25), the condition 

Jn(0) = 0, 

and Schwarz’s inequality, we find that 




nWl 2 = f y'n(Z) dl sg I y' 2 {Z)dZ f dZ < A/ 2 7T, 

Jo Jo Jo 


12 
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so that {jnC*)} is uniformly bounded. 9 Moreover, again using Schwarz’s 
inequality, we have 


I y n (x 2 ) - aOi) | 2 = ( 2 y' n (x) dx < I ' 2 y'ndx- f 2 dx 

Jxt J X-\ Jx-I 


^ M 2 \x 2 - Xil, 


so that is equicontinuous. 10 Thus, according to Arzela’s theorem, 11 
we can select a uniformly convergent subsequence {^ nm W} from the 
sequence {j n W} and Lemma 1 is proved. 

We now set 

y^(x) = lim y nm (x). (26) 

m-» co 


Our object is to show that y i} (x) satisfies the Sturm-Liouville equation (14) 
with X = X (1) . However, we are still not in a position to take the limit as 
m -> oo of the integral 


\\pyZ + Qyljdx , 

Jo 

since as yet we know nothing about the convergence of the derivatives y' nm . 
Therefore, the fact that for each m, the function y nm minimizes the functional 
J[y] for y in the ^-dimensional space spanned by the linear combinations 

n m 

2 si n kx 

k= 1 

[subject to the condition (19) with n = n m ] still does not imply that the limit 
function .y (1) (*) minimizes J [y] for y in the full space of admissible functions. 
To avoid this difficulty, we argue as follows: 

Lemma 2. Let j(x) be continuous in [0,7i], and let 

f [-(PhJ + QMydx = 0 (27) 

Jo 


9 A family of functions T defined on [ a , b] is said to be uniformly bounded if there is 
a constant M such that 

|<K*)| M 

for all ^ e T and all a ^ x ^ b. 

10 A family of functions T defined on [ a , b ] is said to be equicontinuous if given any 
e > 0, there is a 8 > 0 such that 

Wx 2 ) - <K*i)| < e 

for all ijef, provided that \x 2 — *i| < S. 

11 Arzela’s theorem states that every uniformly bounded and equicontinuous sequence 
of functions contains a uniformly convergent subsequence (converging to a continuous 
limit function). See e.g., R. Courant and D. Hilbert, op. cit ., vol. 1, p. 59. 
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for every function h(x ) e^ 2 (0,7i), 12 satisfying the boundary conditions 
h( 0) = h( tu) = 0, h\ 0) = h\n) = 0. (28) 

Then y(x) also belongs to i^ 2 (0, n), and 

-(Py'y + Qiy = 0 . 

Proof If we integrate (27) by parts and use (28), we find that 

f [ — (Phy + Qih]y dx = — f Ph"y dx — f P'h'ydx + f Qfhy dx 
Jo Jo Jo Jo 

= - Jo \-Py + J o P'ydl + J o (J* Qiydt j di | dx = o. 

It follows from Lemma 3, p. 10 that 

- p y + Jo p 'y d\ + J o (Qiy dt j ^ = c 0 + c^, (29) 

where c 0 and c x are constants. Since the right-hand side and the 
second and third terms in the left-hand side of (29) are obviously 
differentiable, (Py )' exists, and in fact, differentiating (29) term by term, 
we find that 

-{Py)'+P‘y+ f Q iy dl = Cl . (30) 

Jo 

Since the function P is continuously differentiable and does not vanish, 
y exists and is continuous. Thus, (30) reduces to 

-Py' + f Qiydl = Cl . (31) 

Jo 

Since the right-hand side and the second term in the left-hand side of (31) 
are differentiable, it follows that {.Py ')' exists, and in fact 

-(pyy + = o, 

as asserted. Moreover, by the same argument as before, y" exists and is 
continuous. 

41.3. We can now show that the function /^C*) defined by (26), whose 
existence follows from Lemma 1, satisfies the Sturm-Liouville equation 

-(Py a) y + Qy a) = X (i y», (32) 

where X (1) is the limit (24). According to the theory of Lagrange multipliers 
(cf. footnote 7, p. 43), at the point (a ( 1 1) ,..., a ( n 1} ) where the quadratic form 
(20) achieves its minimum subject to the subsidiary condition (19), we have 

|4(«i, ■■■,*„)- X<» J o ( 2 sin kx ) } dx = 0 ( r = 1, • • •» n). 


12 I.e., for every h{x) with continuous first and second derivatives in [0,7r]. 
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This leads to the n equations 

f /pOOf T aj^sin kx )*\(sin rx)' 

J o l Lfcti J (33) 

+ [Q( x ) — 2 a(fcl> S ^ n s * n dx = 0 (r = 1,.. n). 

Multiplying each of the equations (33) by an arbitrary constant C$ n) and 
summing over r from 1 to n , we obtain 

f [Py'nh'n + (G - dx = 0, (34) 

JO 

where 

A n (x) = 2 Cr n) sin rx. (35) 

r = 1 

An integration by parts transforms (34) into 

f [-(/%;y + (6 - dx = 0. (36) 

Jo 

If /i(x) is an arbitrary function in i^ 2 (0,7i) satisfying the boundary conditions 
(28), we can choose the coefficients C ( r n) in such a way that 

h n => h, ti n => h\ K => h" 

(see Prob. 8). Here, the symbol => denotes convergence in the mean, i.e., 
/i n => h stands for 

lim f | h n (x) — h(x) | 2 dx = 0 

n-» oo JO 

Since y n 1} j> (1) uniformly in [0,7i], 13 it follows from (36) that 
lim f [-(Ph' n J + (Q- WDhnJ/v dx 

m-» oo JO 

= f [-(Phy + (e - x (i) )/ ? ]y i) rfx = o 

jo 

(see Prob. 9). The fact that y a) is an element of i^ 2 (0, n) and satisfies the 
Sturm-Liouville equation (32) is now an immediate consequence of Lemma 2, 
with Q 1 = Q — X (1) . 

So far, the function y i} (x) has been defined as the limit of a subsequence 
(jnmW) °f the original sequence {y^C*)}. We now show that the sequence 


13 We now restore the superscript on y ^ 1} . 
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{jn 1} W} itself converges to y i} (x). To prove this, we use the fact that for a 
given X, the solution of the Sturm-Liouville equation 

-{pyy + Qy = ^ (37) 

satisfying the boundary conditions 

j(0) = 0, y(n) = 0 (38) 

and the normalization condition 

C y\x)dx = 1 (39) 

Jo 

is unique except for sign. Let y l} (x) be a solution of (37) corresponding 
to X = X (1) , and suppose y i} (^o) ^ 0 at some point x 0 in [0,7i]. Then 
choose the sign so that y i) (x 0 ) > 0. Similarly, let y ( n\x) be a solution 
of (37) corresponding to X = X^ 1} , and choose the signs so that y^Cxo) ^ 0 
for all n. If y n 1} (x) does not converge to y i} (;c), we can select another 
subsequence from {y^M) converging to another solution y i} (^) of (37), 
where again X = X (1) . Because of the uniqueness (except for sign) of 
solutions of (37), subject to (38) and (39), this means that 

y a) (x) = -y a \xl 

and hence y l} (^o) < 0, which is impossible, since ynXx 0 ) ^ 0 for all n. 
Therefore, y^C*)-> y^(x) [in fact, uniformly], provided we choose each 
y ( nXx) with the proper sign. 

41.4. We have just proved that the Sturm-Liouville problem has the eigen¬ 
function y iJ (x), corresponding to the eigenvalue X (1) . The “next” eigen¬ 
function y 2) (x) and the corresponding eigenvalue X (2) can be found by 
minimizing the quadratic functional 

J[y\ = fV / 2 + Qy 2 )dx (40) 

Jo 

subject to the same conditions (38) and (39) as before, plus an extra orthog¬ 
onality condition 

f y^MjC*) dx = 0. (41) 

Jo 

In fact, substituting 

n 

yy> = 2 s * n k* (42) 

k= 1 

into (40), we again obtain the quadratic form y n (a x ,..., a n ) given by (20), 
but this time we study J n ( a 1? . . ., a n ) on the set of functions of the form (42) 
which not only lie on the ^-dimensional sphere o n with equation (19), thereby 
satisfying the normalization condition (39), but are also orthogonal to the 
function 

/nXx) = 2 “fc 11 sin kx, 

k= 1 
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i.e., satisfy the condition 

V u. k f sin kx ( V aS 1} sin /x) dx = ^ V a fc a ( fc 1) = 0. (43) 

k = l V i = i / ^ Jc= 1 

This is the equation of an (n — l)-dimensional hyperplane, passing through 
the origin of coordinates in n dimensions. Its intersection with the sphere 
(19) is an (n — l)-dimensional sphere cr n -i. By the same argument as before 
(cf. footnote 8), J n ( a 1? ..., a n ) has a minimum X ( n 2) on o n _It is not hard 
to see that 

[cf. (23)], and hence the limit 

X (2) = lim X ( n 2) 

n-* oo 

exists, since J[y ] is bounded from below. Moreover, it is obvious that 

X (1) ^ X (2) . (44) 

Now let 

>4 2) = 2 a k 2) s * n 

k= 1 

be the linear combination (42) achieving the minimum X ( n 2) , where, of course, 
the point (a ( x 2) ,..., a ( n 2) ) lies on the sphere a n _ x . As before, we can show 
that the sequence {/ n 2) M} converges uniformly to a limit function / 2) (x) 
which satisfies the Sturm-Liouville equation (37) [with X = X (2) ], the boun¬ 
dary conditions (38), the normalization condition (39), and the orthogonality 
condition (41). In other words, / 2) (;c) is the eigenfunction of the Sturm- 
Liouville problem corresponding to the eigenvalue X (2) . Since orthogonal 
functions cannot be linearly dependent, and since only one eigenfunction 
corresponds to each eigenvalue (except for a constant factor), we have the 
strict inequality 

X (1) < X (2) , 

instead of (44). Finally, we note that by repeating the above argument, 
with obvious modifications, we can obtain further eigenvalues X (3) , X (4) ,..., 
and corresponding eigenfunctions / 3) (x), / 4) M,.... 

For further material on the use of direct methods in the calculus of varia¬ 
tions, we refer the reader to the abundant literature on the subject. 14 


14 See e.g., N. Krylov, Les methodes de solution approchee des problemes de la physique 
mathematique , Memorial des Sciences Mathematiques, fascicule 49, Gauthier-Villars 
et Cie., Paris (1931); S. G. Mikhlin, FIpaMbie MeTOflw b MaTeMaTHnecKOH OH3HKe 
(Direct Methods in Mathematical Physics ), Gos. Izd. Tekh.-Teor. Lit., Moscow (1950); 
S. G. Mikhlin, BapnauHOHHbie MeTOflM b MaTeMaTHnecKOH OH3HKe ( Variational 
Methods in Mathematical Physics ), Gos. Izd. Tekh.-Teor. Lit., Moscow (1957); L. V. 
Kantorovich :.nd V. I. Krylov, Approximate Methods of Higher Analysis , translated 
by C. D. Ben>ter, Interscience Publishers, Inc., New York (1958). 
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PROBLEMS 

1. Let the functional J[y] be such that J[y] > — oo for some admissible 
function, and let 

SUp/fy] = (X < +oo, 

where sup denotes the least upper bound or supremum. By analogy with the 
treatment given in Sec. 39, define a maximizing sequence , and then state and 
prove the corresponding version of the theorem on p. 194. 

2. Use the Ritz method to find an approximate solution of the problem of 
minimizing the functional 

J[y] = J o l (/ 2 - y 2 - 2xy) dx, y{ 0) = *1) = 0, 

and compare the answer with the exact solution. 

Hint. Choose the sequence {9 n (*)} (see p. 195) to be 
x(l — x), x 2 (l — x), x 3 (l — x),... 

3. Use the Ritz method to find an approximate solution of the extremum 
problem associated with the functional 

/[>>] = J 1 (x 3 y" 2 + lOOxy 2 - 20 xy) dx, /(l) = /(l) = 0. 

Hint. Choose the sequence ( 9 „(x)} to be 

(x - l) 2 , x(x - l) 2 , x 2 (x - l) 2 ,... 

4. Use the Ritz method to find an approximate solution of the problem of 
minimizing the functional 

J[y\ = J* O' 2 + y 2 + 2 xy) dx, m = y(2) = 0, 
and compare the answer with the exact solution. 

5. Use the Ritz method to find an approximate solution of the equation 

d 2 u d 2 u _ . 

Jx 2 + 8y 2 ~ 1 

inside the square 

R: —a^x^a, —a^y^a, 

where u vanishes on the boundary of R. 

Hint. Study the functional 

'W-/£[&*♦ O'-*]** 

and choose the two-dimensional generalization of the sequence (9nW) to be 
(x 2 - a 2 )(y 2 - b 2 ), (x 2 + ^ 2 )(x 2 - a 2 )(y 2 - b 2 \ - 
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6. Write the Sturm-Liouville equation associated with the quadratic functional 

[y] = O 1/ 2 + O' 2 ) dx, 

where c and c x > 0 are constants, subject to the boundary conditions 
y(a) = 0, y(b) = 0. 

Find the corresponding eigenvalues and eigenfunctions. 

7. Formulate a variational problem leading to the Sturm-Liouville equation 
(14) subject to the boundary conditions 

y'(a) = 0, y'(b) = 0, 

instead of the boundary conditions (15). 

Hint. Recall the natural boundary conditions (29) of Sec. 6. 

8. Prove that any function h(x) e £^ 2 (0, tu) satisfying the boundary conditions 
(28) can be approximated in the mean by a linear combination 

n 

k n (x) - 2 Sm rx , 

r = 1 

where at the same time h' n {x) approximates h\x ) and h'n(x) approximates 
h'\x) [in the mean]. Show that the coefficients C ( r n) need not depend on n 
and can be written simply as C r . 

Hint. Form the Fourier sine series of h"{x) and integrate it twice term by 
term. 

9. Show that if /„(*) -> f(x) in the mean and g n (x) —> ^(x) uniformly in some 
interval [a, b\ 9 then 

f fn(x)gn(x) dx — f f(x)g(x) dx. 

Ja Ja 

Hint. Use Schwarz’s inequality. 



Appendix 


PROPAGATION OF DISTURBANCES 
AND THE 

CANONICAL EQUATIONS 1 


In this appendix, we consider the propagation of “disturbances” in a 
medium which is regarded as being both inhomogeneous and anisotropic. 
Thus, in general, the velocity of propagation of a disturbance at a given point 
of the medium will depend both on the position of the point and on the 
direction of propagation of the disturbance. We also make the following 
two assumptions about the process under consideration: 

1. Each point can be in only one of two states, excitation or rest , i.e., no 
concept of the intensity of the disturbance is introduced. 

2. If a disturbance arrives at the point P at the time t , then starting from 
the time t , the point P itself serves as a source of further disturbances 
propagating in the medium. 

In the analysis given here, our aim is to show that a study of processes 
of excitation of the kind described, together with purely geometric considera¬ 
tions, can be used to derive such basic concepts of the calculus of variations 
as the canonical equations, the Hamiltonian function, the Hamilton-Jacobi 
equation, etc. The treatment given here does not rely upon the derivations 
of these concepts given in the main body of the book (see Secs. 16, 23), and in 
fact can be used to replace the previous derivations. The reader acquainted 


1 The authors would like to acknowledge discussions with M. L. Tsetlyn on the 
material presented here. 
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with optics will recognize that we are essentially constructing a mathe¬ 
matical model of the familiar Huygens' principle. 2 

1. Statement of the problem. Let the medium in which the disturbance 
propagates fill a space SC, which for simplicity we take to be ^-dimensional 
Euclidean space. Thus, every point xef is specified by a set of n real 
numbers x 1 ,..., x n . Choosing a fixed point x 0 e SC, we consider the set of 
all smooth curves 

* = ( 1 ) 

passing through x 0 . The set of vectors tangent to the curve (1) at the point 
x 0 , i.e., the set of vectors 



forms an ^-dimensional linear space, which we call the tangent space to SC at 
x 0 and denote by ^{xf). Note that the end points of the vectors in any 
tangent space ST (x) are points of SC itself. 3 

Since the medium is inhomogeneous and anisotropic, the velocity of 
propagation of disturbances in SC depends on position and direction, i.e., 
on x and x'. Let f(x , x') denote the reciprocal of this velocity. Then, if 
x(s) and x(s + ds) are two neighboring points lying on some curve x = x(s), 
the time dt which it takes the disturbance to go from the point x(s) to the 
point x(s + ds) can be written in the form 

and the time it takes the disturbance to propagate along some infinite path 
joining the points x 0 = ^ 0 ) and x 1 = x(si) equals 



Suppose the point x 0 is “excited,” and consider all possible paths joining 
x 0 and x x . Then, because of the “off or on” character of the excitation, 
the only path which plays any role in the propagation process is the one along 
which the disturbance propagates in the smallest time, say t. (Disturbances 
arriving at via some other path which is traversed in a time > t will arrive 

2 See e.g., B. B. Baker and E. T. Copson, The Mathematical Theory of Huygens' 
Principle , Oxford University Press, New York (1939). 

3 In the case considered, the tangent space 3~{x ) is particularly simple, and in fact, 
is just an / 7 -dimensional Euclidean space with origin at x. More generally, 3C can be an 
/7-dimensional differentiable manifold, and then the end points of vectors in x ) need 
no longer lie in ft'. However, the analysis given below can easily be extended to this 
case, by exploiting the “local flatness” of ft". 
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at *i “too late” to have any further effect on the propagation process, 
since x 1 will already be found in a state of excitation.) In other words, 

where the minimum is taken with respect to all curves x = x(s) joining the 
points x 0 and x ± . Thus, the propagation of disturbances in the medium 
obeys the familiar Fermat principle (p. 34), i.e., among all paths joining x 0 
and x l9 the disturbance always propagates along the path which it traverses 
in the least time. We shall refer to such paths as the trajectories of the 
disturbance. 

Next, we state a physically plausible set of properties for the function 
/(*, x'): 

1. The propagation time along any curve is positive, and hence 

/(*, x') > 0 if x' 7 ^ 0. (3) 

2. The propagation time along any curve y joining x 0 and x l9 given by 
the integral ( 2 ), depends only on y and not on how y is parameterized. 
It follows by the argument given in Chap. 2, Sec. 10 that f(x 9 x') is 
positive-homogeneous of degree 1 in x': 

f(x 9 Xx') = X/(x, x') for every X > 0. (4) 

In particular, (4) implies that 

f(x , x' + x’) = /(*, x') + f(x , x'), (5) 

if x' = Xx\ where X > 0. 

3. The time it takes a disturbance to traverse a curve y connecting x 0 to x ± 
is the same as the time it takes a disturbance to traverse y in the opposite 
direction from x 1 to x 09 and hence 

/(*,-*')=/(*,*')• ( 6 ) 

4. If the medium is homogeneous, so that/is a function of direction only, 
then the disturbance propagates in straight lines (see Prob. 1). In 
particular, no disturbance emanating from a given point x 0 can arrive 
at another point more quickly by taking a path consisting of two 
straight line segments than by going along the straight line segment 
joining x 0 and x x . This implies the convexity condition 

f(x f + *') <f(x') + fix') 

(see Prob. 2). If / depends on x in a sufficiently smooth way (e.g., if 
the derivatives df/dx 1 ,..., df/dx 11 exist), the same argument shows that 
the convexity condition 

f(x, x' + x’) 5? f{x, x') + fix , x') 


(7) 
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holds for sufficiently small x\ x\ but then (7) holds for all jc', x' because 
of the homogeneity property (4). 

5. Actually, we strengthen the condition (7) somewhat, by requiring that 
/ satisfy the strict convexity condition , consisting of (7) plus the stipula¬ 
tion that (5) holds only if x' = Xx', where X > 0. 

Now suppose we have a disturbance which at time t = 0 occupies some 
region of excitation R in 3F, and propagates further as time evolves. The 
boundary of R will be called the wave front. Let 

S(x , t) = 0 

be the equation of the wave front at the time t. Then our problem can 
be stated as follows: Find the equation satisfied by the function S(x, t) 
describing the wave front , and find the equations of the trajectories of the 
disturbance. 

2. Introduction of a norm in (x). Our next step is to use the function 

f(x, x') to introduce a norm in the ^-dimensional tangent space This 

can be done by defining the norm of the vector x* = 0 to be zero and setting 

11*1 =/(*,*') ( 8 ) 

for all vectors x' ^ 0 in (x). The fact that \\x' || actually meets all the require¬ 
ments for a norm (see p. 6) is an immediate consequence of (3), (4), (6) 
and (7). The set of all vectors in (x) such that 

/(*,*') = II*'1 =« (9) 

is called a sphere of radius a in (x), with center at the point x. The sphere 
(9) is just the boundary of the closed region of (x) [and hence of S£] which 
is excited during the time a by a disturbance originally concentrated at the 
point x. In this language, our problem can be rephrased as follows: 
Suppose a tangent space (x), equipped with the norm (8) satisfying the strict 
convexity condition , is defined at each point x of an n-dimensional space 9C. 
Find the equations describing the propagation of disturbances in 3F, if during 
the time dt the disturbance originally at x “spreads out and fills '’ the sphere 

f(x , dx) = dt. 

3. The conjugate space ^(x). Let y[x'] be a linear functional (see p. 8), 
defined on the tangent space ^(x). Then there is a unique vector 

P = 

such that 

?[*'] = (P, *') 

for all x' g ^~(x), where by ( p , x') is meant the scalar product 

n 

2 PiX 1 ’ + • • • + p n x n ' 

i= 1 
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(see Prob. 3). 4 Conversely, any scalar product ( p , x') obviously defines a 
linear functional on 5~{x). The set of all linear functionals on or 

equivalently the set of all vectors p , is itself an ^-dimensional linear space, 
called the conjugate space of ^~(x) and denoted by ^ (x). We define the 
norm of a vector p e3E(x) by the formula 5 

Ml = sup^, (10) 

where the least upper bound is taken over all vectors x' # 0 in ^(x) [see 
Prob. 4]. In the present context, we write H(x,p) instead of ||/?||, i.e., 

H(x, p) = sup (11) 

It can be shown that the transition from the function f(x, x') to the function 
H(x,p) defined by (11) is just the parametric form of the Legendre transfor¬ 
mation discussed in Sec. 18. 

4. The propagation process. Suppose the wave front at the time t is the 
surface a*, with equation 

S(x , 0 = 0. (12) 

We now examine in more detail the mechanism governing the evolution of a t 
in time. By hypothesis, each point of a t serves as a source of new distur¬ 
bances, which during the time dt excite the region bounded by the sphere 

f(x , dx) = dt. (13) 

Since the function/(x, x') determining the propagation process is assumed to 
be differentiable and strictly convex (in the sense explained above), there is a 
unique hyperplane tangent to each point of the sphere (13), and this hyper¬ 
plane has only one point in common with the sphere, i.e., its point of tangency. 
If we construct a family of spheres (13), one for each point x e a t , then the 
wave front a t + dt at the time t + dt , with equation 

S(x , t + dt) = 0, (14) 

is just the envelope E of this family of spheres. In fact, E is the “interface” 
separating the points of 3C which can be reached from a t in times ^ dt from 
the points which can only be reached from a t in times > dt. This construction 
has two important implications: 


4 The reader familiar with tensor analysis will note that here we make a distinction 
between contravariant vectors like x', with components x 1 ' indexed by superscripts, 
and covariant vectors like p , with components p t indexed by subscripts. See e.g., 
G. E. Shilov, op. cit Sec. 39. 

5 By sup is meant the least upper bound or supremum. 
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1. Given a point x e a t , there is a unique point x + dx e <j t + dt which is 
excited after the time dt by a disturbance initially at x. In fact, x + dx 
is the point of a t + dt lying on the (unique) hyperplane tangent to both 
(13) and G t + dt . To see this, we observe that it takes a time >dt for a 
disturbance starting from x to reach any other point of a t + dt - 6 Thus, 
there is a unique direction of propagation defined at each point x e a t , 
and it is clear that a disturbance leaving x in this direction will arrive at 
the surface G t + dt more quickly than a disturbance leaving x in any other 
direction, as required by Fermat’s principle. 

2. Conversely, given a point x + dx ecr, + d ,, there is a unique point 
x e a t9 which at the time t was the source of the disturbance reaching 
x + dx at the time t + dt. In fact, x is just the center of the (unique) 
sphere of radius dt which shares a tangent hyperplane with <j t + dt . 


5. The Hamilton-Jacobi equation. As was just shown, every hyperplane 
tangent to the surface a t + dt with equation (14) must also be tangent to some 
sphere of radius dt whose center lies on the surface a t with equation (12). 
This fact can be used to derive a differential equation satisfied by the function 
S(x, t). First, we observe that every hyperplane in the tangent space ^(x) 
can be written in the form 

n 

2 PtX v = const, 

t=i 


where p = (pu . • - , p n ) is a vector in the conjugate space ^ (x). Let x + dx 
be an arbitrary point of c t + dt , whose “source” is the point x e a t . Then 
the hyperplane in «^"(x) tangent to a t + dt at x + dx has the equation 



(15) 


where c is a constant. If the hyperplane (15) is also tangent to the sphere 
(13), as required, then c equals the norm of the vector 


VS 


as 

fix 1 ' 



multiplied by the radius of the sphere, i.e., 

c = H(x, VS) dt. 

Therefore, (15) becomes 


2 || dx 1 = H(x, VS) dt. 


(16) 


6 Physically, this means that if the surface a t is changed only in a small neighborhood 
of the point the surface o t+dt is also changed only in a small neighborhood of x + dx. 
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But 



because of the meaning of x and x + dx. Comparing (16) and (17), we 
finally obtain 

Y t + H ( x > V5 ) = 0 . ( 18 ) 

This equation describes the way the wave front evolves in time, and is just 
the familiar Hamilton-Jacobi equation , already considered in Sec. 23. 

We now show the relation between the trajectories of the disturbance and 
the general solution of (18). It will be recalled that as a wave front evolves 
in time, each of its points goes into a succession of uniquely defined points 
lying on neighboring wave fronts, thereby “sweeping out” a trajectory y 
which automatically minimizes the functional (2). Thus, if we specify a 
one-parameter family of wave fronts 

S(x, t) = 0, (19) 

where the parameter is the time t , every point x 0 on some “initial” surface 
S(x, t 0 ) generates a trajectory. Choosing the point x 0 arbitrarily, we find 
that the one-parameter family of surfaces (19) determines an (n — 1)- 
parameter family of trajectories, such that one and only one trajectory of the 
family passes through each point xe9C. More generally, let 

S(x , t 9 a x ,..., a n ) 

be a complete integral of the Hamilton-Jacobi equation depending on n 
parameters a x , ...,a n . This complete integral determines an (n + 1)- 
parameter family of surfaces 7 

S(x , t , a l5 .. ., a n ) = 0, (20) 

which in turn determines a (2 n — l)-parameter family of trajectories. Then 
the fact that the trajectories of the disturbances are the extremals of the 
functional (2) leads to a geometric interpretation of Jacobi’s theorem (p. 91), 
concerning the construction of a general solution of the system of Euler 
equations of a functional from a complete integral of the corresponding 
Hamilton-Jacobi equation. 8 


7 Since S(x , t + t 0i oi u ..., a n ) == 0 is also an integral surface of the Hamilton-Jacobi 
equation for arbitrary t 0i the family of surfaces (20) actually depends on n + 1 parameters. 

8 It should be noted that we are considering a parametric problem, so that there 
is dependence between the Euler equations (see Sec. 10 and Remark 4 of Sec. 37). As 
a result, the general solution of the 2 n equations obtained here contains only 2 n — 1 
arbitrary constants. 
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6. The canonical equations. To derive the differential equations satisfied 
by the trajectories of the disturbance, we might use Fermat’s principle, 
minimizing the functional (2) and solving the corresponding Euler equations. 
However, we prefer to use our geometric model of the propagation process. 
If we introduce the time t as the parameter along each trajectory, it follows 
from 

/(*, dx) = dt 

and the homogeneity of f(x , dx) in the argument dx that 



i.e., the norm of the vector dxfdt is identically equal to 1. Using (16), we 
find that at each point a, the vector dxfdt (tangent to the trajectory along 
which the disturbance propagates) is related to the covariant vector p 
(determining the hyperplane tangent to the wave front) by the formula 


dx 1 jj, v 

Z P'ln = H ( X ’P)- 

i= i Ul 


According to (21) and the definition (11) of the norm of vectors in 3T (a), 
we see that 



if p is any other vector in F (a). Thus, the expression 


'ST dx 1 JJ, v 

Z -Jr ~ h ( x >p)> 

i= 1 Ul 

regarded as a function of /?, achieves its maximum when p is the vector 
determining the hyperplane tangent to the wave front. Therefore, along 
the trajectories, the conditions 



must hold, i.e., 

dx * _ dH(x,p) 
dt dpi 


= 0 (*' = 1,.. n) 

(i = 1 ,..n). 


( 22 ) 


We have just obtained a system of n ordinary differential equations of the 
first order satisfied by the trajectories. Since these equations involve In 
unknown functions a 1 , ..., a 71 and p l9 ..., p n , we still need n more equations 
to completely describe the trajectories. To find the missing equations, 
we use the fact that the surfaces representing the wave fronts at different times 
are not arbitrary, but satisfy the Hamilton-Jacobi equation (18), while the 
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values Pi at each point of a trajectory are the components dS/dx* determining 
the hyperplane tangent to the wave front. In other words, 

Pi = pit) = •St-xXO, • • X n {t), t ] 

along each trajectory, and hence 

d 2 S dx k 


dpi = d_dS_ ^ d_d_S_ ^_ 

dt dt dx 1 dt dx i + dx k dx 1 dt 


(23) 


We now introduce the following notation: If the function //(*,/?), where 
Pi = dS/dx^ is regarded as a function of x 1 ,..., x n and t , we indicate its 
partial derivative with respect to x { by 

dH 

dx { 

whereas if H(x , p) is regarded as a function of the In variables x 1 ,..., x n 
and Pi,.. .,p n , we indicate its partial derivative with respect to x l by 

3H 

dx p - const 

Then, using the Hamilton-Jacobi equation (18), we can write (23) in the form 


dpi 

dt 


dH 

dx { 


^ d 2 S dx k 
+ i dx* dx 1 dt 


Along the trajectories, we have 


and 


dH 

_ dH 

, v 8H 

dx { 

t = const dx 

+ ^ ~p. — 

p = const k =i vPk 


dp k 

dx 1 ' 


Pk 


ds_ 

dx k ' 


dx^ _ dH 
dt dp k 


(24) 

(25) 

(26) 


Substituting (25) and (26) into (24), we obtain n differential equations 

(/ = !,...,«). 


dpi 

dt 


dH 

dx { 


Combining these equations with (22), we obtain a system of 2 n differential 
equations 

dx x _ dH(x,p) 
dt 


dpi 

dt 


dpi 

dH(x , p) 
dx 1 


(27) 


where / = 1,.. n. The integral curves of (27) are the trajectories along 
which the disturbance propagates, i.e., the extremals of the functional (2). 
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The system (27) is of course the canonical system of Euler equations for the 
variational problem associated with (2) [cf. Sec. 16], and represents the so- 
called characteristic system associated with the Hamilton-Jacobi equation 
(18) [cf. p. 90]. 


PROBLEMS 

1. Prove that if f(x , x') depends on direction only, then the disturbance 
propagates through the medium along straight lines. 

2. Prove that if f(x , x') = f(x') is independent of x, then f(x') is precisely 
the time required to traverse the vector x'. 

3. Prove that every linear functional cp[*] defined on an ^-dimensional 
Euclidean space of points * = (x 1 ,.. ., x n ) is of the form 

<pM = Pix 1 + • • • + p n x n , 
where p = (p u . . .,p n ) is uniquely determined by 9. 

4 . Verify that formula (10) actually defines a norm for the elements p of the 
conjugate space ZT (x). 

5. Why is the strict convexity condition (p. 211) needed in constructing 
wave fronts for the disturbance? 



Appendix 


VARIATIONAL METHODS 
IN PROBLEMS OF 
OPTIMAL CONTROL 


In this appendix, we sketch some results obtained by L. S. Pontryagin 
and his students, in their investigations of the theory of optimal control 
processes. 1 The connection between this subject and classical variational 
theory will also be discussed. 

1. Statement of the problem. In many cases, finding the optimal “ operating 
regime” for a physical system (with a suitable optimality criterion) leads 
to the following mathematical problem: Suppose the state of the physical 
system is characterized by n real numbers x 1 ,...,*", forming a vector 
x = (x 1 ,. .., x n ) in the ^-dimensional “phase space” 9C of the system, 
and suppose the state varies with time in the way described by the system 
of differential equations 

^ =fXx\...,x\u\..., U «) (/= (1) 

Here, the k real numbers w 1 ,. . ., u k form a vector u = (m 1 , . .., u k ) belonging 
to some fixed “control region” Q, which we take to be a subset of 


1 See L. S. Pontryagin, Optimal control processes, Usp. Mat. Nauk, 14, no. 1,3 (1959); 
V. G. Boltyanski, R. V. Gamkrelidze and L. S. Pontryagin, The theory of optimal 
processes , /, The maximum principle , Izv. Akad. Nauk SSSR, Ser. Mat., 24, 3 (1960); 
L. S. Pontryagin, V. G. Boltyanski, R. V. Gamkrelidze and E. F. Mishchenko, The 
Mathematical Theory of Optimal Processes, translated and edited by K. N. TrLrogofT and 
L. W. Neustadt, Interscience Publishers, New York (1962). The more general case 
where O is a topological space is considered in the first two references. 

218 
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A>dimensional Euclidean space, and the f\x , u) are n continuous functions 
defined for all x e 2C and all w e £2. 

Now suppose we specify a vector function u(t ), t 0 ^ t ^ fi, called the 
control function , with values in £2. Then, substituting w = u(t) in (1), we 
obtain the system of differential equations 

^ = f‘[x\ u x (t),u\t)] (i = l,..., n). (2) 

For every initial value x 0 = x(t 0 ), this system has a definite solution, called 
a trajectory . The aggregate 

c/ = {«(?), ? 0 , ?!, x 0 }, (3) 

consisting of a control function u(t ), an interval [f 0 , ^i] and an initial value 
x 0 = x(t 0 ), will be called a control process. Thus, to every control process, 
there corresponds a trajectory, i.e., a solution of (2). 

Next, let 

Pix 1 ,.. .,x n , u 1 ,..., u k ) 

be a function which is defined, together with its partial derivatives 

df° 

■L •••«>• 

for all xef and u e Q.. To every control process U, we assign the number 

J[U] = f tl f°(x,u)dt 9 (4) 

i.e., J[U] is a functional defined on the set of control processes. Then, 
the control process (3) is said to be optimal if the inequality 

J[U] ^ J[U*] 

holds for any other control process U* carrying the given point x 0 into the 
point x l9 i.e., such that the corresponding trajectory x*(t) satisfies the con¬ 
dition x*(/f) = x x . By the optimal trajectory , we mean the trajectory 
corresponding to the optimal control process. Our aim is to find necessary 
conditions characterizing optimal control processes and optimal trajectories. 

It should be pointed out that in calling a control process optimal , it is 
assumed that some class of admissible control processes has been specified in 
advance. Here, we assume that the components u^t),..., u k (t) of any 
admissible control process take values in £2, and are bounded and piecewise 
continuous (with left-hand and right-hand limits at every point of dis¬ 
continuity). 

An important special case of the problem of optimal control is the situation 
where the functional (4) reduces to the integral 
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representing the time it takes to go from the point x 0 to the point x x . 
In this case, optimality means taking the least time to go from x 0 to x lm 


2. Relation to the calculus of variations. The problem of optimal control 
is intimately related to certain traditional problems of the calculus of 
variations. In fact, the integral 

f7°(x, u)dt 
h 0 

can be regarded as a functional depending on n + k functions x 1 ,..., x n , 
w 1 ,..., u k , i.e., as a functional defined on some class of curves in n + k + 1 
dimensions. Since the functions x 1 ,..., x n , w\ ..., u k are connected by the 
equations ( 1 ), we are dealing with the problem of finding a minimum subject 
to nonholonomic constraints (see p. 48). Since the boundary conditions are 
equivalent to the requirement that the desired optimal trajectory x(t) begin 
at the point x 0 and end at the point x l9 the end points of the admissible curves 
inour(rt + k + l)-dimensional space have to lie on two (k + l)-dimensional 
hyperplanes, determined by giving the coordinates x 1 ,..., x n the fixed values 
jc£,. . -Kg and *!,..., x*. 

Thus, we see that the problem of optimal control is a variant of the problem 
of finding a minimum subject to subsidiary conditions. The problem of 
optimal control has the special feature that we specify in advance a definite 
class of admissible control processes, where the functions u\t\ ..., u k (t) 
are required to take values in a given fixed region £ 2 , but in general are not 
required to be continuous. 

We can easily show that the simplest ^-dimensional variational problem, 
where the integrand does not depend on t explicitly , 2 is a special case of the 
problem of optimal control. To this end, suppose that among the curves 
passing through two fixed points 

it is required to find the curve for which the functional 



has a minimum. To paraphrase this problem as a problem of optimal 
control, we need only write (5) in the form 

f 1 f 0 {x x , . . ., x n , w 1 ,. . ., u k ) dt , 

^0 


and take the system ( 1 ) to be simply 


dx l 

~dt 




(i = 


2 This condition is not really a restriction, since any functional can be transformed 
into this form, e.g., by going over to the parametric form of the problem. 
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3. Necessary conditions for optimality. To find necessary conditions for 
a given control process and the corresponding trajectory to be optimal, we 
supplement the system of equations 


dx l 

dt 

with the extra equation 


= /'(*,«) (i = l ,•••,«) 


dx° 

dt 


= f°(x, u), 


where f°{x, u) is the integrand of the functional (4) which is to be minimized. 
At the same time, we supplement the initial conditions 

x'ito) = x‘ 0 (/ = l,n) (6) 

with the extra condition 

*°(/o) = 0. (7) 

For convenience, we introduce the (n + l)-dimensional vector function 
x(t) = (x°(t), x(t)) = (x°(t), x\t),x\t)). 

It is clear that if U is an admissible control process and if x = x(t) is the 
solution of the system 3 4 

d -£=f\x,u) (i = 0 , ( 8 ) 

corresponding to (J and the initial conditions ( 6 ) and (7), then 
J[U] = f 1 f°(x, u) dt = x»(tj). 

J to 

Thus, the problem of optimal control can be stated as follows: Find the 
admissible control process U for which the solution x(r) of the system ( 8 ), 
satisfying the initial conditions ( 6 ) and (7), has the smallest possible value of 

Next, in addition to the variables x°, x 1 ,. . ., x n , we introduce new variables 
^o, i|>i,.. .,ip n satisfying the following system of differential equations, known 
as the conjugate 4 of the system ( 8 ): 


d± 

dt 


2 

y = 0 


( u) . 

dx< 


(/ = 0 , 


( 9 ) 


3 Note that the functions /“, and hence the functions TI and H defined below, do not 
involve x°(t). 

4 This system has the following geometric interpretation: In the space of vectors 
(^o, ^i,..W conjugate to the space of vectors (*°, x 1 ,. . x n ) [see p. 211], consider 
the hyperplane 

n 

^ = C = COnSt 

a =0 

passing through the initial point (0, *o, • ■ •, *o)- Then the system (9) describes the 
“transport” of this hyperplane along the trajectories corresponding to solutions of the 
system (8). In other words, if the satisfy (9) and the x l satisfy (9) for t 0 ^ t ^ t u then 

n 

2 = C (t 0 ^ ^ fl). 

a =0 

For more details, see the second of the references cited on p. 218. 
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Let 

<K0 = OWO, +i(0» • • •. WO), 

and consider the following function of the variables x 1 ,..., x n , ..., 

wi,.. u k : 

IT(+, x, u) = ^ '!'«/“(*, «)• (10) 

a = 0 

In terms of n, we can write the equations ( 8 ) and (9) in the form 

dx' _ dU 
dt 

( 11 ) 

dtyt = _ dU 
dt dx ( ’ 

where i = 0, 1,..., n. The equations (11) remind us of the canonical system 
of Euler equations [see formula (1 1 ), p. 70]. However, they have a different 
meaning, since the canonical equations form a closed system, in which the 
number of equations equals the number of unknown functions, whereas ( 11 ) 
involves not only x and ip but also the unknown function w, and hence ( 10 ) 
becomes a closed system only, when u is specified. In fact, in order to write 
equations for the optimal control problem resembling the canonical equations, 
we would have to use the function 

x) = sup II( 4 >, x, w), ( 12 ) 


instead of the function n(ip, x, w ). 5 

4. The maximum principle. We can now state the following theorem, 
whose proof can be found in the references cited on p. 218: 

Theorem ( The maximum principle). Let U = { u(t ), t 0 , t u x 0 } be an 
admissible control process , and let x{t) be the corresponding integral curve 
of the system ( 8 ) passing through the point (0, xj, ..., xg) for t = 0 , and 
satisfying the conditions 

x\tl) = x{, . . . , *"(*!) = X’l 

for t = t x . Then if the control process U is optimal , there exists a con¬ 
tinuous vector function ip (0 = (^o( 0 > <h( 0 > • • •, ^n( 0 ) such that 

1. The function 4>(0 satisfies the system (9) for x = x(t ), u = u(t); 


5 The transition from II to Jt? is analogous to the Legendre transformation, considered 
in Sec. 18. 



APPENDIX II 


PROBLEMS OF OPTIMAL CONTROL 223 


2. For all t in [7 0 > ti\, the function (10) achieves its maximum for 
u = u(t), i.e., 

n MO, 40, 40] = *M), 40], (13) 

where the function is defined by (12); 

3. The relations 

Uh) < 0, u{t i)] = 0 (14) 

hold at the time t x . Actually , if +(0> x(0 and u(t) satisfy the system 
(8), (9) and the condition (13), the functions ty 0 (t) and t ), x(0] 

turn out to be constants , and hence in (14) we can replace t x by any 
value of t in [/ 0 , fj. 

Remark 1. The maximum principle can often be used as a prescription for 
constructing the optimal trajectory, in the following way: For every fixed 
ip and x, we find the value of u for which the expression 

2 +«/“<*,«) 

a= 0 

takes its maximum. If this determines u as a single-valued function 

w = w(+, x) (15) 

of and x, then, substituting (15) into the equations (8) and (9), we obtain 
a closed system of 2 (n + 1) equations involving 2 (n + 1) unknown functions. 
These are just the equations which have to be satisfied by the optimal 
trajectory. 

Remark 2. For the simple ^-dimensional variational problem discussed 
on p. 220, the system (8), (9), or the equivalent system (11), together with the 


maximum principle, reduces to the usual system of Euler equations, 
this, consider the functional 

To see 

u 1 ,. . 

Jt 0 

., u n ) dt 

(16) 

[cf. (5)], where 

i dx ‘ t- i 


(17) 

In this case, the function (10) is 




X, U) = tK/°(X, U) + 2 
a= 1 

and the system (11) becomes 


dx° 

dt 


f°(x, u). 


d '\>0 A d’hi , df°(x, u) 

^r = 0, — 


( 18 ) 
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where / = 1, 


i.e., 


, n. Maximizing ri(vp, x, w), we find that 


an _ df°(x, u) 

du { du { 


+ — 0 , 


— 4^0 


df°(x, u) 

du ‘ 


(/ = 


Since d^ 0 /dt = 0, we have 4> 0 = const, and hence 

d_ [ df°(x, u) 1 = d/°(x, u) 

dt [ du 1 J dx l 


This is just the system of Euler equations corresponding to the functional 
(16), reduced to a system of first-order differential equations by introducing 
the derivatives dx l /dt = u { as new functions (cf. p. 68). 

Remark 3. In Appendix I, we have already encountered the fact that every 
propagation process can be described in two ways, either in terms of the 
trajectories along which the disturbance propagates (the “rays” in optics), 
or in terms of the motion of the wave front. The first approach leads to the 
canonical Euler equations (or, as in the example just considered, to the 
usual form of the Euler equations), i.e., a system of ordinary differential 
equations. The second approach leads to the Hamilton-Jacobi equation, 
i.e., a partial differential equation. Our maximum principle involves the 
study of trajectories, and in this sense is analogous to the method of canonical 
equations. The “wave front approach” to problems of optimal control 
has been developed by R. Bellman. 6 


5. Relation to Weierstrass’ necessary condition. We again consider the 
simple functional (16), (17), where the function TT(vj>, x, u) is given by (18). 
Using (17), we can also write the functional (16) in the form 



. ., x n , x 1 ',. . ., x n ') dt. 


(19) 


The Weierstrass E-function for such a functional is 7 


E(x, x’, z) = /°(x, z) -/V, *') - 2 (z - x'). 


i= 1 


( 20 ) 


6 See the relevant references cited in the Bibliography, p. 227. 

7 See p. 146. Note that E is a function of three rather than four arguments, since (19 
is independent of t. 
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Using (18) and (20), we find that 

ri(+, x, z) - x, x') - 2 - *") ^ n(+, x, x') 

= +0 f°(x, z) - i>of°(x, x') + 2 H z i - x ‘ ') - ( z i - x*W.°‘' + +i) 

i= 1 i= 1 

= +o/°(x, Z) - i>o.f°(x, x') - 2 (z t - x")^/? 1 ' = +o£(x, x', z). (21) 

1=1 

If the function n achieves its maximum for values of u = x' which are 
interior points of the region £2, then 


at these points. Then, since 4*0 ^ 0? it follows from (21) that the condition 
(13) is equivalent to the condition 

E(x, z) > 0. (22) 

This is Weierstrass ’ necessary condition , with which we are already familiar 
(see p. 149). Thus, the maximum principle leads to another, independent 
derivation of (22). It can be shown that the formula 

<\> 0 e = n(tj>, x, z) - x, x') - 2 - x") ^ n(v|>, x, x') 

remains true for variational problems subject to constraints, i.e., for more 
general problems of optimal control. 

We have just proved the equivalence of the maximum principle and 
Weierstrass’ necessary condition (22) in the case where the set £2 of admissible 
values of the control function u(t) is open, i.e., where every point of £2 is an 
interior point. In the case where the optimal control process involves values 
of u(t) lying on the boundary of the region £2, the condition (22) is in general 
no longer valid. However, it can be shown that in such cases, the maximum 
principle continues to apply. 


PROBLEMS 

1. State the maximum principle (p. 222) for the problem of “fastest motion” 
or “time optimal problem,” where the functional (4) reduces to simply 

J[U] = f 1 dt. 

Jto 

Ans. In this case, we write 

A'k X, u) = 2 'Va/"(x, //) 

a = 1 

instead of (10), and in the system (11), / need only range from 1 to n. The 
function in the maximum principle is now replaced by 

//(+, x) = sup />(+, x, it) = JT(+, x) - +o. 

uzQ. 
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Finally, the relations (14) are replaced by 

= -+o ^ 0, 


which actually holds for any t in [/ 0 , fi]. 


2. Consider the differential equation 


d 2 x 
dt 2 


= w, 


(a) 


where the control function u obeys the condition \u\ ^ 1. Introducing the 
“phase coordinates” x 1 and x 2 , we can write (a) as a system 


dx 1 _ 2 dx 2 

Hi ~ X ’ Hi 


(b) 


What trajectory corresponds to the fastest motion from a given initial point 
x 0 to the final point x r = (0, 0) ? 


Hint. The auxiliary variables <J>i and <J> 2 obey the equations 



dty 2 

dt 


= -+!• 


By the maximum principle (modified in accordance with Prob. 1), 
u{t) = sgn 4> 2 (/) = sgn (c 2 - c y t), 

where c 1 and c 2 are constants, sgn x = x/\x\ and w(/) can only change sign 
once. Integrate the system (b) for u — ±1, and draw the corresponding 
families of parabolas in the (x 1 , x 2 ) plane, analyzing the various possibilities 
(corresponding to different initial positions x 0 ). 


3. Study the same “time-optimal problem 
d 2 x 

—rz + x — w, 


Hint. The appropriate system is now 
dx 1 _ 2 dx 2 

Hi ~ x ’ Hi 


for the equation 

i"i < i- 


-a: 1 + u. 


4 . Study the same “time-optimal problem” for the system 
dx 1 o,i d * 2 1,2 

Hi = * + “ ’ Hi= ~ x + " ’ 

where there are two control functions w 1 , u 2 obeying the conditions |w J | ^ 1, 

k 2 l ^ 1. 


Comment. For a detailed discussion of Probs. 2-4, see Chap. 1, Sec 5 
of the book cited on p. 218. 


5. Verify the relations (14) for the simple variational problem (16) discussed 
in Remark 2, p. 223. 

Hint. Use Euler’s theorem on positive-homogeneous functions (Chap. 2, 
Prob. 6). 
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